linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v7 0/2] perf cs-etm: Split Coresight decode by aux records
@ 2021-06-24 16:43 James Clark
  2021-06-24 16:43 ` [PATCH v7 1/2] " James Clark
  2021-06-24 16:43 ` [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records James Clark
  0 siblings, 2 replies; 17+ messages in thread
From: James Clark @ 2021-06-24 16:43 UTC (permalink / raw)
  To: acme, mathieu.poirier, coresight, leo.yan
  Cc: al.grant, branislav.rankov, denik, suzuki.poulose,
	anshuman.khandual, James Clark, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

Change applies to perf/core (45237f9898fc)

Changes since v6:
 * Fix for snapshot mode where buffers are wrapped. This fix was done by clamping the aux record
   size to the size of the buffer (see comment).
 * Added an extra debugging printout.
 * Typo/formatting fixes.
 * Add the change for --dump-raw-trace as a second commit. I planned to do this later, but have now
   finished it so I'll submit it at the same time.
 * Did some more thorough testing around the different snapshot scenarios.

Decoding snapshot files with duplicate data is improved by this patchset because of the reason
mentioned at the end of the testing section. Coincidentally, the same issue is also fixed in
"[PATCH v1 0/3] coresight: Fix for snapshot mode" but by not saving duplicates, rather than not
decoding them.

James Clark (2):
  perf cs-etm: Split Coresight decode by aux records
  perf cs-etm: Split --dump-raw-trace by AUX records

 tools/perf/util/cs-etm.c | 188 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 185 insertions(+), 3 deletions(-)

-- 
2.28.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v7 1/2] perf cs-etm: Split Coresight decode by aux records
  2021-06-24 16:43 [PATCH v7 0/2] perf cs-etm: Split Coresight decode by aux records James Clark
@ 2021-06-24 16:43 ` James Clark
  2021-06-27 11:50   ` Leo Yan
  2021-07-05 19:33   ` Mathieu Poirier
  2021-06-24 16:43 ` [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records James Clark
  1 sibling, 2 replies; 17+ messages in thread
From: James Clark @ 2021-06-24 16:43 UTC (permalink / raw)
  To: acme, mathieu.poirier, coresight, leo.yan
  Cc: al.grant, branislav.rankov, denik, suzuki.poulose,
	anshuman.khandual, James Clark, John Garry, Will Deacon,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

Populate the auxtrace queues using AUX records rather than whole
auxtrace buffers so that the decoder is reset between each aux record.

This is similar to the auxtrace_queues__process_index() ->
auxtrace_queues__add_indexed_event() flow where
perf_session__peek_event() is used to read AUXTRACE events out of
random positions in the file based on the auxtrace index. But now we
loop over all PERF_RECORD_AUX events instead of AUXTRACE buffers. For
each PERF_RECORD_AUX event, we find the corresponding AUXTRACE buffer
using the index, and add a fragment of that buffer to the auxtrace
queues. No other changes to decoding were made, apart from populating
the auxtrace queues. The result of decoding is identical to before,
except in cases where decoding failed completely, due to not resetting
the decoder.

The reason for this change is because AUX records are emitted any time
tracing is disabled, for example when the process is scheduled out.
Because ETM was disabled and enabled again, the decoder also needs to
be reset to force the search for a sync packet. Otherwise there would
be fatal decoding errors.

Testing
=======

Testing was done with the following script, to diff the decoding results
between the patched and un-patched versions of perf:

	#!/bin/bash
	set -ex

	$1 script -i $3 $4 > split.script
	$2 script -i $3 $4 > default.script

	diff split.script default.script | head -n 20

And it was run like this, with various itrace options depending on the
quantity of synthesised events:

	compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns

No changes in output were observed in the following scenarios:

* Simple per-cpu
	perf record -e cs_etm/@tmc_etr0/u top

* Per-thread, single thread
	perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C

* Per-thread multiple threads (but only one thread collected data):
	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597

* Per-thread multiple threads (both threads collected data):
	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597

* Per-cpu explicit threads:
	perf record -e cs_etm/@tmc_etr0/u --pid 853,854

* System-wide (per-cpu):
    perf record -e cs_etm/@tmc_etr0/u -a

* No data collected (no aux buffers)
	Can happen with any command when run for a short period

* Containing truncated records
	Can happen with any command

* Containing aux records with 0 size
	Can happen with any command

* Snapshot mode (various files with and without buffer wrap)
	perf record -e cs_etm/@tmc_etr0/u -a --snapshot

Some differences were observed in the following scenario:

* Snapshot mode (with duplicate buffers)
	perf record -e cs_etm/@tmc_etr0/u -a --snapshot

Fewer samples are generated in snapshot mode if duplicate buffers
were gathered because buffers with the same offset are now only added
once. This gives different, but more correct results and no duplicate
data is decoded any more.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 168 ++++++++++++++++++++++++++++++++++++++-
 1 file changed, 167 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 64536a6ed10a..88e8122f73c9 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2679,6 +2679,172 @@ static u64 *cs_etm__create_meta_blk(u64 *buff_in, int *buff_in_offset,
 	return metadata;
 }
 
+/**
+ * Puts a fragment of an auxtrace buffer into the auxtrace queues based
+ * on the bounds of aux_event, if it matches with the buffer that's at
+ * file_offset.
+ *
+ * Normally, whole auxtrace buffers would be added to the queue. But we
+ * want to reset the decoder for every PERF_RECORD_AUX event, and the decoder
+ * is reset across each buffer, so splitting the buffers up in advance has
+ * the same effect.
+ */
+static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_offset, size_t sz,
+				      struct perf_record_aux *aux_event, struct perf_sample *sample)
+{
+	int err;
+	char buf[PERF_SAMPLE_MAX_SIZE];
+	union perf_event *auxtrace_event_union;
+	struct perf_record_auxtrace *auxtrace_event;
+	union perf_event auxtrace_fragment;
+	__u64 aux_offset, aux_size;
+
+	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
+						   struct cs_etm_auxtrace,
+						   auxtrace);
+
+	/*
+	 * There should be a PERF_RECORD_AUXTRACE event at the file_offset that we got
+	 * from looping through the auxtrace index.
+	 */
+	err = perf_session__peek_event(session, file_offset, buf,
+				       PERF_SAMPLE_MAX_SIZE, &auxtrace_event_union, NULL);
+	if (err)
+		return err;
+	auxtrace_event = &auxtrace_event_union->auxtrace;
+	if (auxtrace_event->header.type != PERF_RECORD_AUXTRACE)
+		return -EINVAL;
+
+	if (auxtrace_event->header.size < sizeof(struct perf_record_auxtrace) ||
+		auxtrace_event->header.size != sz) {
+		return -EINVAL;
+	}
+
+	/*
+	 * In per-thread mode, CPU is set to -1, but TID will be set instead. See
+	 * auxtrace_mmap_params__set_idx(). Return 'not found' if neither CPU nor TID match.
+	 */
+	if ((auxtrace_event->cpu == (__u32) -1 && auxtrace_event->tid != sample->tid) ||
+			auxtrace_event->cpu != sample->cpu)
+		return 1;
+
+	if (aux_event->flags & PERF_AUX_FLAG_OVERWRITE) {
+		/*
+		 * Clamp size in snapshot mode. The buffer size is clamped in
+		 * __auxtrace_mmap__read() for snapshots, so the aux record size doesn't reflect
+		 * the buffer size.
+		 */
+		aux_size = min(aux_event->aux_size, auxtrace_event->size);
+
+		/*
+		 * In this mode, the head also points to the end of the buffer so aux_offset
+		 * needs to have the size subtracted so it points to the beginning as in normal mode
+		 */
+		aux_offset = aux_event->aux_offset - aux_size;
+	} else {
+		aux_size = aux_event->aux_size;
+		aux_offset = aux_event->aux_offset;
+	}
+
+	if (aux_offset >= auxtrace_event->offset &&
+	    aux_offset + aux_size <= auxtrace_event->offset + auxtrace_event->size) {
+		/*
+		 * If this AUX event was inside this buffer somewhere, create a new auxtrace event
+		 * based on the sizes of the aux event, and queue that fragment.
+		 */
+		auxtrace_fragment.auxtrace = *auxtrace_event;
+		auxtrace_fragment.auxtrace.size = aux_size;
+		auxtrace_fragment.auxtrace.offset = aux_offset;
+		file_offset += aux_offset - auxtrace_event->offset + auxtrace_event->header.size;
+
+		pr_debug3("CS ETM: Queue buffer size: %#"PRI_lx64" offset: %#"PRI_lx64
+			  " tid: %d cpu: %d\n", aux_size, aux_offset, sample->tid, sample->cpu);
+		return auxtrace_queues__add_event(&etm->queues, session, &auxtrace_fragment,
+						  file_offset, NULL);
+	}
+
+	/* Wasn't inside this buffer, but there were no parse errors. 1 == 'not found' */
+	return 1;
+}
+
+static int cs_etm__queue_aux_records_cb(struct perf_session *session, union perf_event *event,
+					u64 offset __maybe_unused, void *data __maybe_unused)
+{
+	struct perf_sample sample;
+	int ret;
+	struct auxtrace_index_entry *ent;
+	struct auxtrace_index *auxtrace_index;
+	struct evsel *evsel;
+	size_t i;
+
+	/* Don't care about any other events, we're only queuing buffers for AUX events */
+	if (event->header.type != PERF_RECORD_AUX)
+		return 0;
+
+	if (event->header.size < sizeof(struct perf_record_aux))
+		return -EINVAL;
+
+	/* Truncated Aux records can have 0 size and shouldn't result in anything being queued. */
+	if (!event->aux.aux_size)
+		return 0;
+
+	/*
+	 * Parse the sample, we need the sample_id_all data that comes after the event so that the
+	 * CPU or PID can be matched to an AUXTRACE buffer's CPU or PID.
+	 */
+	evsel = evlist__event2evsel(session->evlist, event);
+	if (!evsel)
+		return -EINVAL;
+	ret = evsel__parse_sample(evsel, event, &sample);
+	if (ret)
+		return ret;
+
+	/*
+	 * Loop through the auxtrace index to find the buffer that matches up with this aux event.
+	 */
+	list_for_each_entry(auxtrace_index, &session->auxtrace_index, list) {
+		for (i = 0; i < auxtrace_index->nr; i++) {
+			ent = &auxtrace_index->entries[i];
+			ret = cs_etm__queue_aux_fragment(session, ent->file_offset,
+							 ent->sz, &event->aux, &sample);
+			/*
+			 * Stop search on error or successful values. Continue search on
+			 * 1 ('not found')
+			 */
+			if (ret != 1)
+				return ret;
+		}
+	}
+
+	/*
+	 * Couldn't find the buffer corresponding to this aux record, something went wrong. Warn but
+	 * don't exit with an error because it will still be possible to decode other aux records.
+	 */
+	pr_err("CS ETM: Couldn't find auxtrace buffer for aux_offset: %#"PRI_lx64
+	       " tid: %d cpu: %d\n", event->aux.aux_offset, sample.tid, sample.cpu);
+	return 0;
+}
+
+static int cs_etm__queue_aux_records(struct perf_session *session)
+{
+	struct auxtrace_index *index = list_first_entry_or_null(&session->auxtrace_index,
+								struct auxtrace_index, list);
+	if (index && index->nr > 0)
+		return perf_session__peek_events(session, session->header.data_offset,
+						 session->header.data_size,
+						 cs_etm__queue_aux_records_cb, NULL);
+
+	/*
+	 * We would get here if there are no entries in the index (either no auxtrace
+	 * buffers or no index at all). Fail silently as there is the possibility of
+	 * queueing them in cs_etm__process_auxtrace_event() if etm->data_queued is still
+	 * false.
+	 *
+	 * In that scenario, buffers will not be split by AUX records.
+	 */
+	return 0;
+}
+
 int cs_etm__process_auxtrace_info(union perf_event *event,
 				  struct perf_session *session)
 {
@@ -2879,7 +3045,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 	if (err)
 		goto err_delete_thread;
 
-	err = auxtrace_queues__process_index(&etm->queues, session);
+	err = cs_etm__queue_aux_records(session);
 	if (err)
 		goto err_delete_thread;
 
-- 
2.28.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-24 16:43 [PATCH v7 0/2] perf cs-etm: Split Coresight decode by aux records James Clark
  2021-06-24 16:43 ` [PATCH v7 1/2] " James Clark
@ 2021-06-24 16:43 ` James Clark
  2021-06-28  1:27   ` Leo Yan
  2021-07-05 19:33   ` Mathieu Poirier
  1 sibling, 2 replies; 17+ messages in thread
From: James Clark @ 2021-06-24 16:43 UTC (permalink / raw)
  To: acme, mathieu.poirier, coresight, leo.yan
  Cc: al.grant, branislav.rankov, denik, suzuki.poulose,
	anshuman.khandual, James Clark, John Garry, Will Deacon,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

Currently --dump-raw-trace skips queueing and splitting buffers because
of an early exit condition in cs_etm__process_auxtrace_info(). Once
that is removed we can print the split data by using the queues
and searching for split buffers with the same reference as the
one that is currently being processed.

This keeps the same behaviour of dumping in file order when an AUXTRACE
event appears, rather than moving trace dump to where AUX records are in
the file.

There will be a newline and size printout for each fragment. For example
this buffer is comprised of two AUX records, but was printed as one:

  0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t

  . ... CoreSight ETM Trace data: size 160 bytes
          Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
          Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
          Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
          Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
          Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
          Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;

But is now printed as two fragments:

  0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t

  . ... CoreSight ETM Trace data: size 80 bytes
          Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
          Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
          Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;

  . ... CoreSight ETM Trace data: size 80 bytes
          Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
          Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
          Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;

Decoding errors that appeared in problematic files are now not present,
for example:

        Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
        ...
        PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
 1 file changed, 18 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 88e8122f73c9..ad777c2a342f 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2430,6 +2430,22 @@ static int cs_etm__process_event(struct perf_session *session,
 	return 0;
 }
 
+static void dump_queued_data(struct cs_etm_auxtrace *etm,
+			     struct perf_record_auxtrace *event)
+{
+	struct auxtrace_buffer *buf;
+	unsigned int i;
+	/*
+	 * Find all buffers with same reference in the queues and dump them.
+	 * This is because the queues can contain multiple entries of the same
+	 * buffer that were split on aux records.
+	 */
+	for (i = 0; i < etm->queues.nr_queues; ++i)
+		list_for_each_entry(buf, &etm->queues.queue_array[i].head, list)
+			if (buf->reference == event->reference)
+				cs_etm__dump_event(etm, buf);
+}
+
 static int cs_etm__process_auxtrace_event(struct perf_session *session,
 					  union perf_event *event,
 					  struct perf_tool *tool __maybe_unused)
@@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
 				cs_etm__dump_event(etm, buffer);
 				auxtrace_buffer__put_data(buffer);
 			}
-	}
+	} else if (dump_trace)
+		dump_queued_data(etm, &event->auxtrace);
 
 	return 0;
 }
@@ -3038,7 +3055,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 
 	if (dump_trace) {
 		cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu);
-		return 0;
 	}
 
 	err = cs_etm__synth_events(etm, session);
-- 
2.28.0


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 1/2] perf cs-etm: Split Coresight decode by aux records
  2021-06-24 16:43 ` [PATCH v7 1/2] " James Clark
@ 2021-06-27 11:50   ` Leo Yan
  2021-07-05 19:33   ` Mathieu Poirier
  1 sibling, 0 replies; 17+ messages in thread
From: Leo Yan @ 2021-06-27 11:50 UTC (permalink / raw)
  To: James Clark
  Cc: acme, mathieu.poirier, coresight, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, John Garry,
	Will Deacon, Mike Leach, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	linux-kernel

On Thu, Jun 24, 2021 at 05:43:02PM +0100, James Clark wrote:
> Populate the auxtrace queues using AUX records rather than whole
> auxtrace buffers so that the decoder is reset between each aux record.

[...]

> Signed-off-by: James Clark <james.clark@arm.com>

I tested this patch with kprobe/uprobe events; and confirmed this
patch works as expected.  So:

Tested-by: Leo Yan <leo.yan@linaro.org>


Below shares my testing steps.

- The rationale for the testing is to use "perf probe" to add trace
  points for PERF_RECORD_AUX/PERF_RECORD_AUXTRACE, and add another
  uprobe event for parsing AUX fragment.  So we can compare the
  recorded PERF_RECORD_AUX events, and check if AUX fragment can
  reorganize the perf AUX trace data correctly or not.

- The testing script test_cs_etm_snapshot.sh:
  http://paste.debian.net/1202563/

- Add kprobe event for perf_event_aux_event() so that it can trace
  the perf event PERF_RECORD_AUX:

  # perf probe --add "perf_event_aux_event head=head:x64 size=size:x64"

- Add uprobe event for __auxtrace_mmap__read() so that it can trace
  the perf event PERF_RECORD_AUXTRACE:

  # perf probe -x /mnt/linux-kernel/linux-cs-dev/tools/perf/perf \
     --add "__auxtrace_mmap__read:58 idx=mm->idx head=head:x64 offset=offset:x64 size=size:x64"

- Add uprobe event for cs_etm__queue_aux_fragment():

  # perf probe -x /mnt/linux-kernel/linux-cs-dev/tools/perf/perf \
     --add "cs_etm__queue_aux_fragment:63 aux_offset=aux_offset:x64 aux_size=aux_event->aux_size:x64"

- Enable trace events:

  # cd /sys/kernel/debug/tracing
  # echo 1 > events/probe/enable
  # echo 1 > events/probe_perf/enable

- Executed the testing script and perf report command:

  # sh test_cs_etm_snapshot.sh
  [ perf record: Woken up 7 times to write data ]
  [ perf record: Captured and wrote 19.534 MB /tmp/__perf_test.perf.data.miRDs ]
  # perf report -i /tmp/__perf_test.perf.data.miRDs

- At the end, I can get the tracing result:

  # cd /sys/kernel/debug/tracing
  # cat trace

  # entries-in-buffer/entries-written: 21/21   #P:6
  #
  #                                _-----=> irqs-off
  #                               / _----=> need-resched
  #                              | / _---=> hardirq/softirq
  #                              || / _--=> preempt-depth
  #                              ||| /     delay
  #           TASK-PID     CPU#  ||||   TIMESTAMP  FUNCTION
  #              | |         |   ||||      |         |
                dd-2206    [001] d.h3  1395.922396: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0x400000 size=0x400000
              perf-2204    [002] ....  1395.922565: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0x400000 offset=0x0 size=0x400000
                dd-2206    [001] d.h3  1395.942658: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0x50b3f0 size=0x10b3f0
              perf-2204    [002] ....  1395.942739: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0x50b3f0 offset=0x400000 size=0x10b3f0
                dd-2206    [001] d.h3  1396.000040: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0x90b3f0 size=0x400000
              perf-2204    [002] ....  1396.000150: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0x90b3f0 offset=0x50b3f0 size=0x400000
                dd-2206    [001] d.h3  1396.023098: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0xb87ae0 size=0x27c6f0
              perf-2204    [002] ....  1396.023197: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0xb87ae0 offset=0x90b3f0 size=0x27c6f0
                dd-2206    [001] d.h3  1396.087722: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0xf87ae0 size=0x400000
              perf-2204    [002] ....  1396.087838: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0xf87ae0 offset=0xb87ae0 size=0x400000
                dd-2206    [001] d.h3  1396.131696: perf_event_aux_event: (perf_event_aux_event+0x0/0x108) head=0x1387ae0 size=0x400000
              perf-2204    [002] ....  1396.131808: __auxtrace_mmap__read_L58: (0xaaaaab154394) idx=1 head=0x1387ae0 offset=0xf87ae0 size=0x400000
              perf-2217    [003] ....  1439.581363: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0x0 aux_size=0x400000
              perf-2217    [003] ....  1439.581467: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0x400000 aux_size=0x10b3f0
              perf-2217    [003] ....  1439.581510: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0x50b3f0 aux_size=0x400000
              perf-2217    [003] ....  1439.581553: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0x90b3f0 aux_size=0x27c6f0
              perf-2217    [003] ....  1439.581595: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0xb87ae0 aux_size=0x400000
              perf-2217    [003] ....  1439.581638: cs_etm__queue_aux_fragment_L63: (0xaaaac9582ba8) aux_offset=0xf87ae0 aux_size=0x400000

  We can see the recording AUX trace data in perf_event_aux_event()
  can be decoded properly in events cs_etm__queue_aux_fragment_L63().

Thanks,
Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-24 16:43 ` [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records James Clark
@ 2021-06-28  1:27   ` Leo Yan
  2021-06-28 10:38     ` James Clark
  2021-07-05 19:33   ` Mathieu Poirier
  1 sibling, 1 reply; 17+ messages in thread
From: Leo Yan @ 2021-06-28  1:27 UTC (permalink / raw)
  To: James Clark
  Cc: acme, mathieu.poirier, coresight, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, John Garry,
	Will Deacon, Mike Leach, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	linux-kernel

Hi James,

On Thu, Jun 24, 2021 at 05:43:03PM +0100, James Clark wrote:
> Currently --dump-raw-trace skips queueing and splitting buffers because
> of an early exit condition in cs_etm__process_auxtrace_info(). Once
> that is removed we can print the split data by using the queues
> and searching for split buffers with the same reference as the
> one that is currently being processed.
> 
> This keeps the same behaviour of dumping in file order when an AUXTRACE
> event appears, rather than moving trace dump to where AUX records are in
> the file.
> 
> There will be a newline and size printout for each fragment. For example
> this buffer is comprised of two AUX records, but was printed as one:
> 
>   0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
> 
>   . ... CoreSight ETM Trace data: size 160 bytes
>           Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
>           Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>           Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
>           Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
>           Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>           Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
> 
> But is now printed as two fragments:
> 
>   0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
> 
>   . ... CoreSight ETM Trace data: size 80 bytes
>           Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
>           Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>           Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
> 
>   . ... CoreSight ETM Trace data: size 80 bytes
>           Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
>           Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>           Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
> 
> Decoding errors that appeared in problematic files are now not present,
> for example:
> 
>         Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
>         ...
>         PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
> 
> Signed-off-by: James Clark <james.clark@arm.com>

I tested this patch and the result looks good for me.

I found a side effect introduced by this change is the perf raw dump
also synthesizes events PERF_RECORD_ATTR.  This is because function
cs_etm__synth_events() will execute after applying this patch and it
synthesizes PERF_RECORD_ATTR events.  I don't see any harm for this,
so:

Tested-by: Leo Yan <leo.yan@linaro.org>

Please see an extra comment in below.

> ---
>  tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 88e8122f73c9..ad777c2a342f 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2430,6 +2430,22 @@ static int cs_etm__process_event(struct perf_session *session,
>  	return 0;
>  }
>  
> +static void dump_queued_data(struct cs_etm_auxtrace *etm,
> +			     struct perf_record_auxtrace *event)
> +{
> +	struct auxtrace_buffer *buf;
> +	unsigned int i;
> +	/*
> +	 * Find all buffers with same reference in the queues and dump them.
> +	 * This is because the queues can contain multiple entries of the same
> +	 * buffer that were split on aux records.
> +	 */
> +	for (i = 0; i < etm->queues.nr_queues; ++i)
> +		list_for_each_entry(buf, &etm->queues.queue_array[i].head, list)
> +			if (buf->reference == event->reference)
> +				cs_etm__dump_event(etm, buf);
> +}
> +
>  static int cs_etm__process_auxtrace_event(struct perf_session *session,
>  					  union perf_event *event,
>  					  struct perf_tool *tool __maybe_unused)
> @@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
>  				cs_etm__dump_event(etm, buffer);
>  				auxtrace_buffer__put_data(buffer);
>  			}
> -	}
> +	} else if (dump_trace)
> +		dump_queued_data(etm, &event->auxtrace);

IIUC, in the function cs_etm__process_auxtrace_event(), since
"etm->data_queued" is always true, below flow will never run:

    if (!etm->data_queued) {
        ......

        if (dump_trace)
            if (auxtrace_buffer__get_data(buffer, fd)) {
                    cs_etm__dump_event(etm, buffer);
                    auxtrace_buffer__put_data(buffer);
            }
    }

If so, it's better to use a new patch to polish the code.

Thanks,
Leo

>  
>  	return 0;
>  }
> @@ -3038,7 +3055,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
>  
>  	if (dump_trace) {
>  		cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu);
> -		return 0;
>  	}
>  
>  	err = cs_etm__synth_events(etm, session);
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-28  1:27   ` Leo Yan
@ 2021-06-28 10:38     ` James Clark
  2021-06-28 12:08       ` Leo Yan
  0 siblings, 1 reply; 17+ messages in thread
From: James Clark @ 2021-06-28 10:38 UTC (permalink / raw)
  To: Leo Yan
  Cc: acme, mathieu.poirier, coresight, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, John Garry,
	Will Deacon, Mike Leach, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	linux-kernel



On 28/06/2021 02:27, Leo Yan wrote:
> Hi James,
> 
> On Thu, Jun 24, 2021 at 05:43:03PM +0100, James Clark wrote:
>> Currently --dump-raw-trace skips queueing and splitting buffers because
>> of an early exit condition in cs_etm__process_auxtrace_info(). Once
>> that is removed we can print the split data by using the queues
>> and searching for split buffers with the same reference as the
>> one that is currently being processed.
>>
>> This keeps the same behaviour of dumping in file order when an AUXTRACE
>> event appears, rather than moving trace dump to where AUX records are in
>> the file.
>>
>> There will be a newline and size printout for each fragment. For example
>> this buffer is comprised of two AUX records, but was printed as one:
>>
>>   0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
>>
>>   . ... CoreSight ETM Trace data: size 160 bytes
>>           Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
>>           Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>>           Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
>>           Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
>>           Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>>           Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
>>
>> But is now printed as two fragments:
>>
>>   0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
>>
>>   . ... CoreSight ETM Trace data: size 80 bytes
>>           Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
>>           Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>>           Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
>>
>>   . ... CoreSight ETM Trace data: size 80 bytes
>>           Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
>>           Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>>           Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
>>
>> Decoding errors that appeared in problematic files are now not present,
>> for example:
>>
>>         Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
>>         ...
>>         PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
>>
>> Signed-off-by: James Clark <james.clark@arm.com>
> 
> I tested this patch and the result looks good for me.
> 
> I found a side effect introduced by this change is the perf raw dump
> also synthesizes events PERF_RECORD_ATTR.  This is because function
> cs_etm__synth_events() will execute after applying this patch and it
> synthesizes PERF_RECORD_ATTR events.  I don't see any harm for this,
> so:
> 
> Tested-by: Leo Yan <leo.yan@linaro.org>
> 

Thanks for the testing!

> Please see an extra comment in below.
> 
>> ---
>>  tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
>>  1 file changed, 18 insertions(+), 2 deletions(-)
>>
>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>> index 88e8122f73c9..ad777c2a342f 100644
>> --- a/tools/perf/util/cs-etm.c
>> +++ b/tools/perf/util/cs-etm.c
>> @@ -2430,6 +2430,22 @@ static int cs_etm__process_event(struct perf_session *session,
>>  	return 0;
>>  }
>>  
>> +static void dump_queued_data(struct cs_etm_auxtrace *etm,
>> +			     struct perf_record_auxtrace *event)
>> +{
>> +	struct auxtrace_buffer *buf;
>> +	unsigned int i;
>> +	/*
>> +	 * Find all buffers with same reference in the queues and dump them.
>> +	 * This is because the queues can contain multiple entries of the same
>> +	 * buffer that were split on aux records.
>> +	 */
>> +	for (i = 0; i < etm->queues.nr_queues; ++i)
>> +		list_for_each_entry(buf, &etm->queues.queue_array[i].head, list)
>> +			if (buf->reference == event->reference)
>> +				cs_etm__dump_event(etm, buf);
>> +}
>> +
>>  static int cs_etm__process_auxtrace_event(struct perf_session *session,
>>  					  union perf_event *event,
>>  					  struct perf_tool *tool __maybe_unused)
>> @@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
>>  				cs_etm__dump_event(etm, buffer);
>>  				auxtrace_buffer__put_data(buffer);
>>  			}
>> -	}
>> +	} else if (dump_trace)
>> +		dump_queued_data(etm, &event->auxtrace);
> 
> IIUC, in the function cs_etm__process_auxtrace_event(), since
> "etm->data_queued" is always true, below flow will never run:
> 
>     if (!etm->data_queued) {
>         ......
> 
>         if (dump_trace)
>             if (auxtrace_buffer__get_data(buffer, fd)) {
>                     cs_etm__dump_event(etm, buffer);
>                     auxtrace_buffer__put_data(buffer);
>             }
>     }
> 
> If so, it's better to use a new patch to polish the code.
> 

Hi Leo,

I think this is not true in piped mode because there is no auxtrace index.
In that mode, events are processed only in file order and cs_etm__process_auxtrace_event()
is called for each buffer.

You can reproduce this with something like this:

     ./perf record -o - ls > stdio.data
     cat stdio.data | ./perf report -i -

There are some other Coresight features that don't work as expected in this mode, like
sorting timestamps between CPUs. The aux split patchset won't work either because random
access isn't possible. And the TRBE patch that I'm working on now won't work, because it
also requires the random access to lookup the flags on the AUX record to configure the 
decoder for unformatted trace.


Thanks
James

> Thanks,
> Leo
> 
>>  
>>  	return 0;
>>  }
>> @@ -3038,7 +3055,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
>>  
>>  	if (dump_trace) {
>>  		cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu);
>> -		return 0;
>>  	}
>>  
>>  	err = cs_etm__synth_events(etm, session);
>> -- 
>> 2.28.0
>>

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-28 10:38     ` James Clark
@ 2021-06-28 12:08       ` Leo Yan
  2021-06-28 20:01         ` Mathieu Poirier
  0 siblings, 1 reply; 17+ messages in thread
From: Leo Yan @ 2021-06-28 12:08 UTC (permalink / raw)
  To: James Clark
  Cc: acme, mathieu.poirier, coresight, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, John Garry,
	Will Deacon, Mike Leach, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	linux-kernel

On Mon, Jun 28, 2021 at 11:38:34AM +0100, James Clark wrote:

[...]

> >>  static int cs_etm__process_auxtrace_event(struct perf_session *session,
> >>  					  union perf_event *event,
> >>  					  struct perf_tool *tool __maybe_unused)
> >> @@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
> >>  				cs_etm__dump_event(etm, buffer);
> >>  				auxtrace_buffer__put_data(buffer);
> >>  			}
> >> -	}
> >> +	} else if (dump_trace)
> >> +		dump_queued_data(etm, &event->auxtrace);
> > 
> > IIUC, in the function cs_etm__process_auxtrace_event(), since
> > "etm->data_queued" is always true, below flow will never run:
> > 
> >     if (!etm->data_queued) {
> >         ......
> > 
> >         if (dump_trace)
> >             if (auxtrace_buffer__get_data(buffer, fd)) {
> >                     cs_etm__dump_event(etm, buffer);
> >                     auxtrace_buffer__put_data(buffer);
> >             }
> >     }
> > 
> > If so, it's better to use a new patch to polish the code.
> > 
> 
> Hi Leo,
> 
> I think this is not true in piped mode because there is no auxtrace index.
> In that mode, events are processed only in file order and cs_etm__process_auxtrace_event()
> is called for each buffer.
> 
> You can reproduce this with something like this:
> 
>      ./perf record -o - ls > stdio.data
>      cat stdio.data | ./perf report -i -

You are right!  I tried these two commands with cs_etm event, just as
you said, in this case, the AUX trace data is not queued; so the flow
for "if (!etm->data_queued)" should be kept.  If so, I am very fine
for current change.  Thanks for sharing the knowledge.

> There are some other Coresight features that don't work as expected in this mode, like
> sorting timestamps between CPUs. The aux split patchset won't work either because random
> access isn't possible. And the TRBE patch that I'm working on now won't work, because it
> also requires the random access to lookup the flags on the AUX record to configure the 
> decoder for unformatted trace.

Cool, looking forward for the patches :)

Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-28 12:08       ` Leo Yan
@ 2021-06-28 20:01         ` Mathieu Poirier
  2021-06-29  6:09           ` Leo Yan
  2021-06-29  8:52           ` James Clark
  0 siblings, 2 replies; 17+ messages in thread
From: Mathieu Poirier @ 2021-06-28 20:01 UTC (permalink / raw)
  To: Leo Yan
  Cc: James Clark, acme, coresight, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, John Garry, Will Deacon,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Mon, Jun 28, 2021 at 08:08:02PM +0800, Leo Yan wrote:
> On Mon, Jun 28, 2021 at 11:38:34AM +0100, James Clark wrote:
> 
> [...]
> 
> > >>  static int cs_etm__process_auxtrace_event(struct perf_session *session,
> > >>  					  union perf_event *event,
> > >>  					  struct perf_tool *tool __maybe_unused)
> > >> @@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
> > >>  				cs_etm__dump_event(etm, buffer);
> > >>  				auxtrace_buffer__put_data(buffer);
> > >>  			}
> > >> -	}
> > >> +	} else if (dump_trace)
> > >> +		dump_queued_data(etm, &event->auxtrace);
> > > 
> > > IIUC, in the function cs_etm__process_auxtrace_event(), since
> > > "etm->data_queued" is always true, below flow will never run:
> > > 
> > >     if (!etm->data_queued) {
> > >         ......
> > > 
> > >         if (dump_trace)
> > >             if (auxtrace_buffer__get_data(buffer, fd)) {
> > >                     cs_etm__dump_event(etm, buffer);
> > >                     auxtrace_buffer__put_data(buffer);
> > >             }
> > >     }
> > > 
> > > If so, it's better to use a new patch to polish the code.
> > > 
> > 
> > Hi Leo,
> > 
> > I think this is not true in piped mode because there is no auxtrace index.
> > In that mode, events are processed only in file order and cs_etm__process_auxtrace_event()
> > is called for each buffer.
> > 
> > You can reproduce this with something like this:
> > 
> >      ./perf record -o - ls > stdio.data
> >      cat stdio.data | ./perf report -i -
> 
> You are right!  I tried these two commands with cs_etm event, just as
> you said, in this case, the AUX trace data is not queued; so the flow
> for "if (!etm->data_queued)" should be kept.  If so, I am very fine
> for current change.  Thanks for sharing the knowledge.
> 
> > There are some other Coresight features that don't work as expected in this mode, like
> > sorting timestamps between CPUs. The aux split patchset won't work either because random
> > access isn't possible. And the TRBE patch that I'm working on now won't work, because it
> > also requires the random access to lookup the flags on the AUX record to configure the 
> > decoder for unformatted trace.
>

There is a lot of things happening in this area.  Based on the above should I
still plan to review this set or should I wait for another revision?

Thanks,
Mathieu

> Cool, looking forward for the patches :)
> 
> Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-28 20:01         ` Mathieu Poirier
@ 2021-06-29  6:09           ` Leo Yan
  2021-06-29  8:52           ` James Clark
  1 sibling, 0 replies; 17+ messages in thread
From: Leo Yan @ 2021-06-29  6:09 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: James Clark, acme, coresight, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, John Garry, Will Deacon,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

Hi Mathieu,

On Mon, Jun 28, 2021 at 02:01:32PM -0600, Mathieu Poirier wrote:

[...]

> > > Hi Leo,
> > > 
> > > I think this is not true in piped mode because there is no auxtrace index.
> > > In that mode, events are processed only in file order and cs_etm__process_auxtrace_event()
> > > is called for each buffer.
> > > 
> > > You can reproduce this with something like this:
> > > 
> > >      ./perf record -o - ls > stdio.data
> > >      cat stdio.data | ./perf report -i -
> > 
> > You are right!  I tried these two commands with cs_etm event, just as
> > you said, in this case, the AUX trace data is not queued; so the flow
> > for "if (!etm->data_queued)" should be kept.  If so, I am very fine
> > for current change.  Thanks for sharing the knowledge.
> > 
> > > There are some other Coresight features that don't work as expected in this mode, like
> > > sorting timestamps between CPUs. The aux split patchset won't work either because random
> > > access isn't possible. And the TRBE patch that I'm working on now won't work, because it
> > > also requires the random access to lookup the flags on the AUX record to configure the 
> > > decoder for unformatted trace.
> >
> 
> There is a lot of things happening in this area.  Based on the above should I
> still plan to review this set or should I wait for another revision?

I think you could continue to review this patch set for AUX data splitting.

Since we have concern for the AUX data splitting with snapshot mode,
James and me both have verified the AUX data splitting (this patch
set) with snapshot mode, and the testing result shows this patch set
is reliable.

Regard another patch set for fixing snapshot mode [1], I will send a new
version (drop patch 1/3 and refine for patch 3/3), so you could hold
on for that patch set.

Thanks,
Leo

[1] https://lore.kernel.org/patchwork/cover/1437696/

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-28 20:01         ` Mathieu Poirier
  2021-06-29  6:09           ` Leo Yan
@ 2021-06-29  8:52           ` James Clark
  2021-06-29 19:08             ` Mathieu Poirier
  1 sibling, 1 reply; 17+ messages in thread
From: James Clark @ 2021-06-29  8:52 UTC (permalink / raw)
  To: Mathieu Poirier, Leo Yan
  Cc: acme, coresight, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, John Garry, Will Deacon,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel



On 28/06/2021 21:01, Mathieu Poirier wrote:
> On Mon, Jun 28, 2021 at 08:08:02PM +0800, Leo Yan wrote:
>> On Mon, Jun 28, 2021 at 11:38:34AM +0100, James Clark wrote:
>>
>> [...]
>>
>>>>>  static int cs_etm__process_auxtrace_event(struct perf_session *session,
>>>>>  					  union perf_event *event,
>>>>>  					  struct perf_tool *tool __maybe_unused)
>>>>> @@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
>>>>>  				cs_etm__dump_event(etm, buffer);
>>>>>  				auxtrace_buffer__put_data(buffer);
>>>>>  			}
>>>>> -	}
>>>>> +	} else if (dump_trace)
>>>>> +		dump_queued_data(etm, &event->auxtrace);
>>>>
>>>> IIUC, in the function cs_etm__process_auxtrace_event(), since
>>>> "etm->data_queued" is always true, below flow will never run:
>>>>
>>>>     if (!etm->data_queued) {
>>>>         ......
>>>>
>>>>         if (dump_trace)
>>>>             if (auxtrace_buffer__get_data(buffer, fd)) {
>>>>                     cs_etm__dump_event(etm, buffer);
>>>>                     auxtrace_buffer__put_data(buffer);
>>>>             }
>>>>     }
>>>>
>>>> If so, it's better to use a new patch to polish the code.
>>>>
>>>
>>> Hi Leo,
>>>
>>> I think this is not true in piped mode because there is no auxtrace index.
>>> In that mode, events are processed only in file order and cs_etm__process_auxtrace_event()
>>> is called for each buffer.
>>>
>>> You can reproduce this with something like this:
>>>
>>>      ./perf record -o - ls > stdio.data
>>>      cat stdio.data | ./perf report -i -
>>
>> You are right!  I tried these two commands with cs_etm event, just as
>> you said, in this case, the AUX trace data is not queued; so the flow
>> for "if (!etm->data_queued)" should be kept.  If so, I am very fine
>> for current change.  Thanks for sharing the knowledge.
>>
>>> There are some other Coresight features that don't work as expected in this mode, like
>>> sorting timestamps between CPUs. The aux split patchset won't work either because random
>>> access isn't possible. And the TRBE patch that I'm working on now won't work, because it
>>> also requires the random access to lookup the flags on the AUX record to configure the 
>>> decoder for unformatted trace.
>>
> 
> There is a lot of things happening in this area.  Based on the above should I
> still plan to review this set or should I wait for another revision?

From my point of view, this one is final. It looks like both Leo and I have tested
it with and without his snapshot changes and it's working as expected in both cases.

Thanks
James

> 
> Thanks,
> Mathieu
> 
>> Cool, looking forward for the patches :)
>>
>> Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-29  8:52           ` James Clark
@ 2021-06-29 19:08             ` Mathieu Poirier
  0 siblings, 0 replies; 17+ messages in thread
From: Mathieu Poirier @ 2021-06-29 19:08 UTC (permalink / raw)
  To: James Clark
  Cc: Leo Yan, Arnaldo Carvalho de Melo, Coresight ML, Al Grant,
	Branislav Rankov, Denis Nikitin, Suzuki K. Poulose,
	Anshuman Khandual, John Garry, Will Deacon, Mike Leach,
	Mark Rutland, Alexander Shishkin, Jiri Olsa, Namhyung Kim,
	linux-arm-kernel, linux-perf-users, Linux Kernel Mailing List

On Tue, 29 Jun 2021 at 02:52, James Clark <james.clark@arm.com> wrote:
>
>
>
> On 28/06/2021 21:01, Mathieu Poirier wrote:
> > On Mon, Jun 28, 2021 at 08:08:02PM +0800, Leo Yan wrote:
> >> On Mon, Jun 28, 2021 at 11:38:34AM +0100, James Clark wrote:
> >>
> >> [...]
> >>
> >>>>>  static int cs_etm__process_auxtrace_event(struct perf_session *session,
> >>>>>                                     union perf_event *event,
> >>>>>                                     struct perf_tool *tool __maybe_unused)
> >>>>> @@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
> >>>>>                           cs_etm__dump_event(etm, buffer);
> >>>>>                           auxtrace_buffer__put_data(buffer);
> >>>>>                   }
> >>>>> - }
> >>>>> + } else if (dump_trace)
> >>>>> +         dump_queued_data(etm, &event->auxtrace);
> >>>>
> >>>> IIUC, in the function cs_etm__process_auxtrace_event(), since
> >>>> "etm->data_queued" is always true, below flow will never run:
> >>>>
> >>>>     if (!etm->data_queued) {
> >>>>         ......
> >>>>
> >>>>         if (dump_trace)
> >>>>             if (auxtrace_buffer__get_data(buffer, fd)) {
> >>>>                     cs_etm__dump_event(etm, buffer);
> >>>>                     auxtrace_buffer__put_data(buffer);
> >>>>             }
> >>>>     }
> >>>>
> >>>> If so, it's better to use a new patch to polish the code.
> >>>>
> >>>
> >>> Hi Leo,
> >>>
> >>> I think this is not true in piped mode because there is no auxtrace index.
> >>> In that mode, events are processed only in file order and cs_etm__process_auxtrace_event()
> >>> is called for each buffer.
> >>>
> >>> You can reproduce this with something like this:
> >>>
> >>>      ./perf record -o - ls > stdio.data
> >>>      cat stdio.data | ./perf report -i -
> >>
> >> You are right!  I tried these two commands with cs_etm event, just as
> >> you said, in this case, the AUX trace data is not queued; so the flow
> >> for "if (!etm->data_queued)" should be kept.  If so, I am very fine
> >> for current change.  Thanks for sharing the knowledge.
> >>
> >>> There are some other Coresight features that don't work as expected in this mode, like
> >>> sorting timestamps between CPUs. The aux split patchset won't work either because random
> >>> access isn't possible. And the TRBE patch that I'm working on now won't work, because it
> >>> also requires the random access to lookup the flags on the AUX record to configure the
> >>> decoder for unformatted trace.
> >>
> >
> > There is a lot of things happening in this area.  Based on the above should I
> > still plan to review this set or should I wait for another revision?
>
> From my point of view, this one is final. It looks like both Leo and I have tested
> it with and without his snapshot changes and it's working as expected in both cases.
>

Very well - I will start working on it once I'm through with the
ultrasoc patchset.

> Thanks
> James
>
> >
> > Thanks,
> > Mathieu
> >
> >> Cool, looking forward for the patches :)
> >>
> >> Leo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 1/2] perf cs-etm: Split Coresight decode by aux records
  2021-06-24 16:43 ` [PATCH v7 1/2] " James Clark
  2021-06-27 11:50   ` Leo Yan
@ 2021-07-05 19:33   ` Mathieu Poirier
  1 sibling, 0 replies; 17+ messages in thread
From: Mathieu Poirier @ 2021-07-05 19:33 UTC (permalink / raw)
  To: James Clark
  Cc: acme, coresight, leo.yan, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, John Garry, Will Deacon,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Thu, Jun 24, 2021 at 05:43:02PM +0100, James Clark wrote:
> Populate the auxtrace queues using AUX records rather than whole
> auxtrace buffers so that the decoder is reset between each aux record.
> 
> This is similar to the auxtrace_queues__process_index() ->
> auxtrace_queues__add_indexed_event() flow where
> perf_session__peek_event() is used to read AUXTRACE events out of
> random positions in the file based on the auxtrace index. But now we
> loop over all PERF_RECORD_AUX events instead of AUXTRACE buffers. For
> each PERF_RECORD_AUX event, we find the corresponding AUXTRACE buffer
> using the index, and add a fragment of that buffer to the auxtrace
> queues. No other changes to decoding were made, apart from populating
> the auxtrace queues. The result of decoding is identical to before,
> except in cases where decoding failed completely, due to not resetting
> the decoder.
> 
> The reason for this change is because AUX records are emitted any time
> tracing is disabled, for example when the process is scheduled out.
> Because ETM was disabled and enabled again, the decoder also needs to
> be reset to force the search for a sync packet. Otherwise there would
> be fatal decoding errors.
> 
> Testing
> =======
> 
> Testing was done with the following script, to diff the decoding results
> between the patched and un-patched versions of perf:
> 
> 	#!/bin/bash
> 	set -ex
> 
> 	$1 script -i $3 $4 > split.script
> 	$2 script -i $3 $4 > default.script
> 
> 	diff split.script default.script | head -n 20
> 
> And it was run like this, with various itrace options depending on the
> quantity of synthesised events:
> 
> 	compare.sh ./perf-patched ./perf-default perf-per-cpu-2-threads.data --itrace=i100000ns
> 
> No changes in output were observed in the following scenarios:
> 
> * Simple per-cpu
> 	perf record -e cs_etm/@tmc_etr0/u top
> 
> * Per-thread, single thread
> 	perf record -e cs_etm/@tmc_etr0/u --per-thread ./threads_C
> 
> * Per-thread multiple threads (but only one thread collected data):
> 	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
> 
> * Per-thread multiple threads (both threads collected data):
> 	perf record -e cs_etm/@tmc_etr0/u --per-thread --pid 4596,4597
> 
> * Per-cpu explicit threads:
> 	perf record -e cs_etm/@tmc_etr0/u --pid 853,854
> 
> * System-wide (per-cpu):
>     perf record -e cs_etm/@tmc_etr0/u -a
> 
> * No data collected (no aux buffers)
> 	Can happen with any command when run for a short period
> 
> * Containing truncated records
> 	Can happen with any command
> 
> * Containing aux records with 0 size
> 	Can happen with any command
> 
> * Snapshot mode (various files with and without buffer wrap)
> 	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
> 
> Some differences were observed in the following scenario:
> 
> * Snapshot mode (with duplicate buffers)
> 	perf record -e cs_etm/@tmc_etr0/u -a --snapshot
> 
> Fewer samples are generated in snapshot mode if duplicate buffers
> were gathered because buffers with the same offset are now only added
> once. This gives different, but more correct results and no duplicate
> data is decoded any more.
> 
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 168 ++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 167 insertions(+), 1 deletion(-)

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>

> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 64536a6ed10a..88e8122f73c9 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2679,6 +2679,172 @@ static u64 *cs_etm__create_meta_blk(u64 *buff_in, int *buff_in_offset,
>  	return metadata;
>  }
>  
> +/**
> + * Puts a fragment of an auxtrace buffer into the auxtrace queues based
> + * on the bounds of aux_event, if it matches with the buffer that's at
> + * file_offset.
> + *
> + * Normally, whole auxtrace buffers would be added to the queue. But we
> + * want to reset the decoder for every PERF_RECORD_AUX event, and the decoder
> + * is reset across each buffer, so splitting the buffers up in advance has
> + * the same effect.
> + */
> +static int cs_etm__queue_aux_fragment(struct perf_session *session, off_t file_offset, size_t sz,
> +				      struct perf_record_aux *aux_event, struct perf_sample *sample)
> +{
> +	int err;
> +	char buf[PERF_SAMPLE_MAX_SIZE];
> +	union perf_event *auxtrace_event_union;
> +	struct perf_record_auxtrace *auxtrace_event;
> +	union perf_event auxtrace_fragment;
> +	__u64 aux_offset, aux_size;
> +
> +	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
> +						   struct cs_etm_auxtrace,
> +						   auxtrace);
> +
> +	/*
> +	 * There should be a PERF_RECORD_AUXTRACE event at the file_offset that we got
> +	 * from looping through the auxtrace index.
> +	 */
> +	err = perf_session__peek_event(session, file_offset, buf,
> +				       PERF_SAMPLE_MAX_SIZE, &auxtrace_event_union, NULL);
> +	if (err)
> +		return err;
> +	auxtrace_event = &auxtrace_event_union->auxtrace;
> +	if (auxtrace_event->header.type != PERF_RECORD_AUXTRACE)
> +		return -EINVAL;
> +
> +	if (auxtrace_event->header.size < sizeof(struct perf_record_auxtrace) ||
> +		auxtrace_event->header.size != sz) {
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * In per-thread mode, CPU is set to -1, but TID will be set instead. See
> +	 * auxtrace_mmap_params__set_idx(). Return 'not found' if neither CPU nor TID match.
> +	 */
> +	if ((auxtrace_event->cpu == (__u32) -1 && auxtrace_event->tid != sample->tid) ||
> +			auxtrace_event->cpu != sample->cpu)
> +		return 1;
> +
> +	if (aux_event->flags & PERF_AUX_FLAG_OVERWRITE) {
> +		/*
> +		 * Clamp size in snapshot mode. The buffer size is clamped in
> +		 * __auxtrace_mmap__read() for snapshots, so the aux record size doesn't reflect
> +		 * the buffer size.
> +		 */
> +		aux_size = min(aux_event->aux_size, auxtrace_event->size);
> +
> +		/*
> +		 * In this mode, the head also points to the end of the buffer so aux_offset
> +		 * needs to have the size subtracted so it points to the beginning as in normal mode
> +		 */
> +		aux_offset = aux_event->aux_offset - aux_size;
> +	} else {
> +		aux_size = aux_event->aux_size;
> +		aux_offset = aux_event->aux_offset;
> +	}
> +
> +	if (aux_offset >= auxtrace_event->offset &&
> +	    aux_offset + aux_size <= auxtrace_event->offset + auxtrace_event->size) {
> +		/*
> +		 * If this AUX event was inside this buffer somewhere, create a new auxtrace event
> +		 * based on the sizes of the aux event, and queue that fragment.
> +		 */
> +		auxtrace_fragment.auxtrace = *auxtrace_event;
> +		auxtrace_fragment.auxtrace.size = aux_size;
> +		auxtrace_fragment.auxtrace.offset = aux_offset;
> +		file_offset += aux_offset - auxtrace_event->offset + auxtrace_event->header.size;
> +
> +		pr_debug3("CS ETM: Queue buffer size: %#"PRI_lx64" offset: %#"PRI_lx64
> +			  " tid: %d cpu: %d\n", aux_size, aux_offset, sample->tid, sample->cpu);
> +		return auxtrace_queues__add_event(&etm->queues, session, &auxtrace_fragment,
> +						  file_offset, NULL);
> +	}
> +
> +	/* Wasn't inside this buffer, but there were no parse errors. 1 == 'not found' */
> +	return 1;
> +}
> +
> +static int cs_etm__queue_aux_records_cb(struct perf_session *session, union perf_event *event,
> +					u64 offset __maybe_unused, void *data __maybe_unused)
> +{
> +	struct perf_sample sample;
> +	int ret;
> +	struct auxtrace_index_entry *ent;
> +	struct auxtrace_index *auxtrace_index;
> +	struct evsel *evsel;
> +	size_t i;
> +
> +	/* Don't care about any other events, we're only queuing buffers for AUX events */
> +	if (event->header.type != PERF_RECORD_AUX)
> +		return 0;
> +
> +	if (event->header.size < sizeof(struct perf_record_aux))
> +		return -EINVAL;
> +
> +	/* Truncated Aux records can have 0 size and shouldn't result in anything being queued. */
> +	if (!event->aux.aux_size)
> +		return 0;
> +
> +	/*
> +	 * Parse the sample, we need the sample_id_all data that comes after the event so that the
> +	 * CPU or PID can be matched to an AUXTRACE buffer's CPU or PID.
> +	 */
> +	evsel = evlist__event2evsel(session->evlist, event);
> +	if (!evsel)
> +		return -EINVAL;
> +	ret = evsel__parse_sample(evsel, event, &sample);
> +	if (ret)
> +		return ret;
> +
> +	/*
> +	 * Loop through the auxtrace index to find the buffer that matches up with this aux event.
> +	 */
> +	list_for_each_entry(auxtrace_index, &session->auxtrace_index, list) {
> +		for (i = 0; i < auxtrace_index->nr; i++) {
> +			ent = &auxtrace_index->entries[i];
> +			ret = cs_etm__queue_aux_fragment(session, ent->file_offset,
> +							 ent->sz, &event->aux, &sample);
> +			/*
> +			 * Stop search on error or successful values. Continue search on
> +			 * 1 ('not found')
> +			 */
> +			if (ret != 1)
> +				return ret;
> +		}
> +	}
> +
> +	/*
> +	 * Couldn't find the buffer corresponding to this aux record, something went wrong. Warn but
> +	 * don't exit with an error because it will still be possible to decode other aux records.
> +	 */
> +	pr_err("CS ETM: Couldn't find auxtrace buffer for aux_offset: %#"PRI_lx64
> +	       " tid: %d cpu: %d\n", event->aux.aux_offset, sample.tid, sample.cpu);
> +	return 0;
> +}
> +
> +static int cs_etm__queue_aux_records(struct perf_session *session)
> +{
> +	struct auxtrace_index *index = list_first_entry_or_null(&session->auxtrace_index,
> +								struct auxtrace_index, list);
> +	if (index && index->nr > 0)
> +		return perf_session__peek_events(session, session->header.data_offset,
> +						 session->header.data_size,
> +						 cs_etm__queue_aux_records_cb, NULL);
> +
> +	/*
> +	 * We would get here if there are no entries in the index (either no auxtrace
> +	 * buffers or no index at all). Fail silently as there is the possibility of
> +	 * queueing them in cs_etm__process_auxtrace_event() if etm->data_queued is still
> +	 * false.
> +	 *
> +	 * In that scenario, buffers will not be split by AUX records.
> +	 */
> +	return 0;
> +}
> +
>  int cs_etm__process_auxtrace_info(union perf_event *event,
>  				  struct perf_session *session)
>  {
> @@ -2879,7 +3045,7 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
>  	if (err)
>  		goto err_delete_thread;
>  
> -	err = auxtrace_queues__process_index(&etm->queues, session);
> +	err = cs_etm__queue_aux_records(session);
>  	if (err)
>  		goto err_delete_thread;
>  
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-06-24 16:43 ` [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records James Clark
  2021-06-28  1:27   ` Leo Yan
@ 2021-07-05 19:33   ` Mathieu Poirier
  2021-07-14 17:45     ` Arnaldo Carvalho de Melo
  1 sibling, 1 reply; 17+ messages in thread
From: Mathieu Poirier @ 2021-07-05 19:33 UTC (permalink / raw)
  To: James Clark
  Cc: acme, coresight, leo.yan, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, John Garry, Will Deacon,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Thu, Jun 24, 2021 at 05:43:03PM +0100, James Clark wrote:
> Currently --dump-raw-trace skips queueing and splitting buffers because
> of an early exit condition in cs_etm__process_auxtrace_info(). Once
> that is removed we can print the split data by using the queues
> and searching for split buffers with the same reference as the
> one that is currently being processed.
> 
> This keeps the same behaviour of dumping in file order when an AUXTRACE
> event appears, rather than moving trace dump to where AUX records are in
> the file.
> 
> There will be a newline and size printout for each fragment. For example
> this buffer is comprised of two AUX records, but was printed as one:
> 
>   0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
> 
>   . ... CoreSight ETM Trace data: size 160 bytes
>           Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
>           Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>           Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
>           Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
>           Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>           Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
> 
> But is now printed as two fragments:
> 
>   0 0 0x8098 [0x30]: PERF_RECORD_AUXTRACE size: 0xa0  offset: 0  ref: 0x491a4dfc52fc0e6e  idx: 0  t
> 
>   . ... CoreSight ETM Trace data: size 80 bytes
>           Idx:0; ID:10;   I_ASYNC : Alignment Synchronisation.
>           Idx:12; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>           Idx:17; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0x0000000000000000;
> 
>   . ... CoreSight ETM Trace data: size 80 bytes
>           Idx:80; ID:10;  I_ASYNC : Alignment Synchronisation.
>           Idx:92; ID:10;  I_TRACE_INFO : Trace Info.; INFO=0x0 { CC.0 }
>           Idx:97; ID:10;  I_ADDR_L_64IS0 : Address, Long, 64 bit, IS0.; Addr=0xFFFFDE2AD3FD76D4;
> 
> Decoding errors that appeared in problematic files are now not present,
> for example:
> 
>         Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
>         ...
>         PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
> 
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
>  1 file changed, 18 insertions(+), 2 deletions(-)
> 

Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>

> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 88e8122f73c9..ad777c2a342f 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2430,6 +2430,22 @@ static int cs_etm__process_event(struct perf_session *session,
>  	return 0;
>  }
>  
> +static void dump_queued_data(struct cs_etm_auxtrace *etm,
> +			     struct perf_record_auxtrace *event)
> +{
> +	struct auxtrace_buffer *buf;
> +	unsigned int i;
> +	/*
> +	 * Find all buffers with same reference in the queues and dump them.
> +	 * This is because the queues can contain multiple entries of the same
> +	 * buffer that were split on aux records.
> +	 */
> +	for (i = 0; i < etm->queues.nr_queues; ++i)
> +		list_for_each_entry(buf, &etm->queues.queue_array[i].head, list)
> +			if (buf->reference == event->reference)
> +				cs_etm__dump_event(etm, buf);
> +}
> +
>  static int cs_etm__process_auxtrace_event(struct perf_session *session,
>  					  union perf_event *event,
>  					  struct perf_tool *tool __maybe_unused)
> @@ -2462,7 +2478,8 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
>  				cs_etm__dump_event(etm, buffer);
>  				auxtrace_buffer__put_data(buffer);
>  			}
> -	}
> +	} else if (dump_trace)
> +		dump_queued_data(etm, &event->auxtrace);
>  
>  	return 0;
>  }
> @@ -3038,7 +3055,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
>  
>  	if (dump_trace) {
>  		cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu);
> -		return 0;
>  	}
>  
>  	err = cs_etm__synth_events(etm, session);
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-07-05 19:33   ` Mathieu Poirier
@ 2021-07-14 17:45     ` Arnaldo Carvalho de Melo
  2021-07-19 16:33       ` Mathieu Poirier
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-07-14 17:45 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: James Clark, coresight, leo.yan, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, John Garry,
	Will Deacon, Mike Leach, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	linux-kernel

Em Mon, Jul 05, 2021 at 01:33:38PM -0600, Mathieu Poirier escreveu:
> On Thu, Jun 24, 2021 at 05:43:03PM +0100, James Clark wrote:
> > for example:
> > 
> >         Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
> >         ...
> >         PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
> > 
> > Signed-off-by: James Clark <james.clark@arm.com>
> > ---
> >  tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
> >  1 file changed, 18 insertions(+), 2 deletions(-)
> > 
> 
> Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>



Thanks, applied.

- Arnaldo


^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-07-14 17:45     ` Arnaldo Carvalho de Melo
@ 2021-07-19 16:33       ` Mathieu Poirier
  2021-07-19 20:10         ` Arnaldo Carvalho de Melo
  0 siblings, 1 reply; 17+ messages in thread
From: Mathieu Poirier @ 2021-07-19 16:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: James Clark, coresight, leo.yan, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, John Garry,
	Will Deacon, Mike Leach, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	linux-kernel

Hi Arnaldo,

On Wed, Jul 14, 2021 at 02:45:34PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jul 05, 2021 at 01:33:38PM -0600, Mathieu Poirier escreveu:
> > On Thu, Jun 24, 2021 at 05:43:03PM +0100, James Clark wrote:
> > > for example:
> > > 
> > >         Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
> > >         ...
> > >         PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
> > > 
> > > Signed-off-by: James Clark <james.clark@arm.com>
> > > ---
> > >  tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
> > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > > 
> > 
> > Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> 
> 
> 
> Thanks, applied.

This patch is the second of a two patch series[1][2].  The first one was applied and
is in Linus' current master branch.  But I can't find the second one in
either your core branch or Linus', and that despite your reply above that it has been
applied.  As such I am guessing something went wrong with the process.  

How do you want to proceed - should we resend the second patch[2] or you still
have it somewhere in your Inbox?

Thanks,
Mathieu

[1]. https://lists.linaro.org/pipermail/coresight/2021-June/006575.html
[2]. https://lists.linaro.org/pipermail/coresight/2021-June/006576.html

> 
> - Arnaldo
> 

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-07-19 16:33       ` Mathieu Poirier
@ 2021-07-19 20:10         ` Arnaldo Carvalho de Melo
  2021-07-20 15:15           ` Mathieu Poirier
  0 siblings, 1 reply; 17+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-07-19 20:10 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: James Clark, coresight, leo.yan, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, John Garry,
	Will Deacon, Mike Leach, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	linux-kernel

Em Mon, Jul 19, 2021 at 10:33:36AM -0600, Mathieu Poirier escreveu:
> Hi Arnaldo,
> 
> On Wed, Jul 14, 2021 at 02:45:34PM -0300, Arnaldo Carvalho de Melo wrote:
> > Em Mon, Jul 05, 2021 at 01:33:38PM -0600, Mathieu Poirier escreveu:
> > > On Thu, Jun 24, 2021 at 05:43:03PM +0100, James Clark wrote:
> > > > for example:
> > > > 
> > > >         Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
> > > >         ...
> > > >         PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
> > > > 
> > > > Signed-off-by: James Clark <james.clark@arm.com>
> > > > ---
> > > >  tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
> > > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > > > 
> > > 
> > > Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> > 
> > 
> > 
> > Thanks, applied.
> 
> This patch is the second of a two patch series[1][2].  The first one was applied and
> is in Linus' current master branch.  But I can't find the second one in
> either your core branch or Linus', and that despite your reply above that it has been
> applied.  As such I am guessing something went wrong with the process.  
> 
> How do you want to proceed - should we resend the second patch[2] or you still
> have it somewhere in your Inbox?

My bad, its no in tmp.perf/urgent, after testing together with other
patches will move to perf/urgent and by the end of this week I'll submit
it to Linus, sorry.

- Arnaldo
 
> Thanks,
> Mathieu
> 
> [1]. https://lists.linaro.org/pipermail/coresight/2021-June/006575.html
> [2]. https://lists.linaro.org/pipermail/coresight/2021-June/006576.html
> 
> > 
> > - Arnaldo
> > 

-- 

- Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records
  2021-07-19 20:10         ` Arnaldo Carvalho de Melo
@ 2021-07-20 15:15           ` Mathieu Poirier
  0 siblings, 0 replies; 17+ messages in thread
From: Mathieu Poirier @ 2021-07-20 15:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: James Clark, coresight, leo.yan, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, John Garry,
	Will Deacon, Mike Leach, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	linux-kernel

On Mon, Jul 19, 2021 at 05:10:34PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Mon, Jul 19, 2021 at 10:33:36AM -0600, Mathieu Poirier escreveu:
> > Hi Arnaldo,
> > 
> > On Wed, Jul 14, 2021 at 02:45:34PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Jul 05, 2021 at 01:33:38PM -0600, Mathieu Poirier escreveu:
> > > > On Thu, Jun 24, 2021 at 05:43:03PM +0100, James Clark wrote:
> > > > > for example:
> > > > > 
> > > > >         Idx:808; ID:1c; I_BAD_SEQUENCE : Invalid Sequence in packet.[I_ASYNC]
> > > > >         ...
> > > > >         PKTP_ETMV4I_0016 : 0x0014 (OCSD_ERR_INVALID_PCKT_HDR) [Invalid packet header]; TrcIdx=822
> > > > > 
> > > > > Signed-off-by: James Clark <james.clark@arm.com>
> > > > > ---
> > > > >  tools/perf/util/cs-etm.c | 20 ++++++++++++++++++--
> > > > >  1 file changed, 18 insertions(+), 2 deletions(-)
> > > > > 
> > > > 
> > > > Reviewed-by: Mathieu Poirier <mathieu.poirier@linaro.org>
> > > 
> > > 
> > > 
> > > Thanks, applied.
> > 
> > This patch is the second of a two patch series[1][2].  The first one was applied and
> > is in Linus' current master branch.  But I can't find the second one in
> > either your core branch or Linus', and that despite your reply above that it has been
> > applied.  As such I am guessing something went wrong with the process.  
> > 
> > How do you want to proceed - should we resend the second patch[2] or you still
> > have it somewhere in your Inbox?
> 
> My bad, its no in tmp.perf/urgent, after testing together with other
> patches will move to perf/urgent and by the end of this week I'll submit
> it to Linus, sorry.
>

Very good - thanks for the follow up.

> - Arnaldo
>  
> > Thanks,
> > Mathieu
> > 
> > [1]. https://lists.linaro.org/pipermail/coresight/2021-June/006575.html
> > [2]. https://lists.linaro.org/pipermail/coresight/2021-June/006576.html
> > 
> > > 
> > > - Arnaldo
> > > 
> 
> -- 
> 
> - Arnaldo

^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2021-07-20 15:59 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24 16:43 [PATCH v7 0/2] perf cs-etm: Split Coresight decode by aux records James Clark
2021-06-24 16:43 ` [PATCH v7 1/2] " James Clark
2021-06-27 11:50   ` Leo Yan
2021-07-05 19:33   ` Mathieu Poirier
2021-06-24 16:43 ` [PATCH v7 2/2] perf cs-etm: Split --dump-raw-trace by AUX records James Clark
2021-06-28  1:27   ` Leo Yan
2021-06-28 10:38     ` James Clark
2021-06-28 12:08       ` Leo Yan
2021-06-28 20:01         ` Mathieu Poirier
2021-06-29  6:09           ` Leo Yan
2021-06-29  8:52           ` James Clark
2021-06-29 19:08             ` Mathieu Poirier
2021-07-05 19:33   ` Mathieu Poirier
2021-07-14 17:45     ` Arnaldo Carvalho de Melo
2021-07-19 16:33       ` Mathieu Poirier
2021-07-19 20:10         ` Arnaldo Carvalho de Melo
2021-07-20 15:15           ` Mathieu Poirier

This is a public inbox, see mirroring instructions
on how to clone and mirror all data and code used for this inbox