linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] Split Coresight decode by aux records
@ 2021-02-12 14:45 James Clark
  2021-02-12 14:45 ` [PATCH 1/7] perf cs-etm: Split up etm queue setup function James Clark
                   ` (8 more replies)
  0 siblings, 9 replies; 19+ messages in thread
From: James Clark @ 2021-02-12 14:45 UTC (permalink / raw)
  To: coresight
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, John Garry,
	linux-kernel, James Clark, Leo Yan, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

Hi All,

Since my previous RFC, I've fixed --per-thread mode and solved
most of the open questions. I've also changed --dump-raw-trace
to use the same code path so it's also working now.

I think the only open questions are:
  * General approach
  * If aux records need to be saved, or if they can be pulled
    from elsewhere.

I've also tested perf inject which is now working with troublesome
files.

Thanks
James

James Clark (7):
  perf cs-etm: Split up etm queue setup function
  perf cs-etm: Only search timestamp in current sample's queue.
  perf cs-etm: Save aux records in each etm queue
  perf cs-etm: don't process queues until cs_etm__flush_events
  perf cs-etm: split decode by aux records.
  perf cs-etm: Use existing decode code path for --dump-raw-trace
  perf cs-etm: Suppress printing when resetting decoder

 .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  10 +-
 tools/perf/util/cs-etm.c                      | 300 ++++++++++--------
 2 files changed, 168 insertions(+), 142 deletions(-)

-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [PATCH 1/7] perf cs-etm: Split up etm queue setup function
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
@ 2021-02-12 14:45 ` James Clark
  2021-02-20  8:11   ` Leo Yan
  2021-02-12 14:45 ` [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue James Clark
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: James Clark @ 2021-02-12 14:45 UTC (permalink / raw)
  To: coresight
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, John Garry,
	linux-kernel, James Clark, Leo Yan, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

Refactor the function into separate allocation and
timestamp search parts. Later the timestamp search
will be done multiple times.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 60 +++++++++++++++++++++-------------------
 1 file changed, 31 insertions(+), 29 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index a2a369e2fbb6..27894facae5e 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -765,33 +765,12 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm)
 	return NULL;
 }
 
-static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
-			       struct auxtrace_queue *queue,
-			       unsigned int queue_nr)
+static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
 {
 	int ret = 0;
+	u64 timestamp;
 	unsigned int cs_queue_nr;
 	u8 trace_chan_id;
-	u64 timestamp;
-	struct cs_etm_queue *etmq = queue->priv;
-
-	if (list_empty(&queue->head) || etmq)
-		goto out;
-
-	etmq = cs_etm__alloc_queue(etm);
-
-	if (!etmq) {
-		ret = -ENOMEM;
-		goto out;
-	}
-
-	queue->priv = etmq;
-	etmq->etm = etm;
-	etmq->queue_nr = queue_nr;
-	etmq->offset = 0;
-
-	if (etm->timeless_decoding)
-		goto out;
 
 	/*
 	 * We are under a CPU-wide trace scenario.  As such we need to know
@@ -808,7 +787,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
 		 */
 		ret = cs_etm__get_data_block(etmq);
 		if (ret <= 0)
-			goto out;
+			return ret;
 
 		/*
 		 * Run decoder on the trace block.  The decoder will stop when
@@ -817,7 +796,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
 		 */
 		ret = cs_etm__decode_data_block(etmq);
 		if (ret)
-			goto out;
+			return ret;
 
 		/*
 		 * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all
@@ -849,10 +828,33 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
 	 * Note that packets decoded above are still in the traceID's packet
 	 * queue and will be processed in cs_etm__process_queues().
 	 */
-	cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_chan_id);
-	ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp);
-out:
-	return ret;
+	cs_queue_nr = TO_CS_QUEUE_NR(etmq->queue_nr, trace_chan_id);
+	return auxtrace_heap__add(&etmq->etm->heap, cs_queue_nr, timestamp);
+}
+
+static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
+			       struct auxtrace_queue *queue,
+			       unsigned int queue_nr)
+{
+	struct cs_etm_queue *etmq = queue->priv;
+
+	if (list_empty(&queue->head) || etmq)
+		return 0;
+
+	etmq = cs_etm__alloc_queue(etm);
+
+	if (!etmq)
+		return -ENOMEM;
+
+	queue->priv = etmq;
+	etmq->etm = etm;
+	etmq->queue_nr = queue_nr;
+	etmq->offset = 0;
+
+	if (etm->timeless_decoding)
+		return 0;
+	else
+		return cs_etm__search_first_timestamp(etmq);
 }
 
 static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
  2021-02-12 14:45 ` [PATCH 1/7] perf cs-etm: Split up etm queue setup function James Clark
@ 2021-02-12 14:45 ` James Clark
  2021-02-20 11:50   ` Leo Yan
  2021-05-06 10:45   ` James Clark
  2021-02-12 14:45 ` [PATCH 3/7] perf cs-etm: Save aux records in each etm queue James Clark
                   ` (6 subsequent siblings)
  8 siblings, 2 replies; 19+ messages in thread
From: James Clark @ 2021-02-12 14:45 UTC (permalink / raw)
  To: coresight
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, John Garry,
	linux-kernel, James Clark, Leo Yan, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

Change initial timestamp search to only operate on the queue
related to the current event. In a later change the bounds
of the aux record will also be used to reset the decoder and
the record is only relevant to a single queue.

This change makes some files that had coresight data
but didn't syntesise any events start working and generating
events. I'm not sure of the reason for that. I'd expect this
change to only affect the ordering of events.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 30 ++++++++++++++----------------
 1 file changed, 14 insertions(+), 16 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 27894facae5e..8f8b448632fb 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -97,7 +97,7 @@ struct cs_etm_queue {
 /* RB tree for quick conversion between traceID and metadata pointers */
 static struct intlist *traceid_list;
 
-static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
+static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
 static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
 static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
 					   pid_t tid);
@@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm,
 static int cs_etm__flush_events(struct perf_session *session,
 				struct perf_tool *tool)
 {
-	int ret;
 	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
 						   struct cs_etm_auxtrace,
 						   auxtrace);
@@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session,
 	if (!tool->ordered_events)
 		return -EINVAL;
 
-	ret = cs_etm__update_queues(etm);
-
-	if (ret < 0)
-		return ret;
-
 	if (etm->timeless_decoding)
 		return cs_etm__process_timeless_queues(etm, -1);
 
@@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
 	etmq->queue_nr = queue_nr;
 	etmq->offset = 0;
 
-	if (etm->timeless_decoding)
-		return 0;
-	else
-		return cs_etm__search_first_timestamp(etmq);
+	return 0;
 }
 
 static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
@@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
 	return 0;
 }
 
-static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
+static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
 {
+	int ret;
 	if (etm->queues.new_data) {
 		etm->queues.new_data = false;
-		return cs_etm__setup_queues(etm);
+		ret = cs_etm__setup_queues(etm);
+		if (ret)
+			return ret;
 	}
 
-	return 0;
+	if (!etm->timeless_decoding)
+		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
+	else
+		return 0;
 }
 
 static inline
@@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session,
 	else
 		timestamp = 0;
 
-	if (timestamp || etm->timeless_decoding) {
-		err = cs_etm__update_queues(etm);
+	if ((timestamp || etm->timeless_decoding)
+			&& event->header.type == PERF_RECORD_AUX) {
+		err = cs_etm__update_queues(etm, sample->cpu);
 		if (err)
 			return err;
 	}
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 3/7] perf cs-etm: Save aux records in each etm queue
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
  2021-02-12 14:45 ` [PATCH 1/7] perf cs-etm: Split up etm queue setup function James Clark
  2021-02-12 14:45 ` [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue James Clark
@ 2021-02-12 14:45 ` James Clark
  2021-02-27  7:10   ` Leo Yan
  2021-02-12 14:45 ` [PATCH 4/7] perf cs-etm: don't process queues until cs_etm__flush_events James Clark
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 19+ messages in thread
From: James Clark @ 2021-02-12 14:45 UTC (permalink / raw)
  To: coresight
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, John Garry,
	linux-kernel, James Clark, Leo Yan, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

The aux records will be used set the bounds of decoding in a
later commit. In the future we may also want to use the flags
of each record to control decoding.

Do these need to be saved in their entirety, or can pointers
to each record safely be saved instead for later access?

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++++++---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 8f8b448632fb..88b541b2a804 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -92,12 +92,16 @@ struct cs_etm_queue {
 	/* Conversion between traceID and index in traceid_queues array */
 	struct intlist *traceid_queues_list;
 	struct cs_etm_traceid_queue **traceid_queues;
+	int aux_record_list_len;
+	int aux_record_list_idx;
+	struct perf_record_aux *aux_record_list;
 };
 
 /* RB tree for quick conversion between traceID and metadata pointers */
 static struct intlist *traceid_list;
 
-static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
+static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu,
+				 struct perf_record_aux *aux_record);
 static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
 static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
 					   pid_t tid);
@@ -585,6 +589,7 @@ static void cs_etm__free_queue(void *priv)
 
 	cs_etm_decoder__free(etmq->decoder);
 	cs_etm__free_traceid_queues(etmq);
+	free(etmq->aux_record_list);
 	free(etmq);
 }
 
@@ -759,6 +764,19 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm)
 	return NULL;
 }
 
+static int cs_etm__save_aux_record(struct cs_etm_queue *etmq,
+				   struct perf_record_aux *aux_record)
+{
+	etmq->aux_record_list = reallocarray(etmq->aux_record_list,
+					      etmq->aux_record_list_len+1,
+					      sizeof(*etmq->aux_record_list));
+	if (!etmq->aux_record_list)
+		return -ENOMEM;
+
+	etmq->aux_record_list[etmq->aux_record_list_len++] = *aux_record;
+	return 0;
+}
+
 static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
 {
 	int ret = 0;
@@ -865,7 +883,7 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
 	return 0;
 }
 
-static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
+static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct perf_record_aux *aux)
 {
 	int ret;
 	if (etm->queues.new_data) {
@@ -875,6 +893,14 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
 			return ret;
 	}
 
+	/* In timeless mode, cpu is set to -1, and a single aux buffer is filled */
+	if (cpu < 0)
+		cpu = 0;
+
+	ret = cs_etm__save_aux_record(etm->queues.queue_array[cpu].priv, aux);
+	if (ret)
+		return ret;
+
 	if (!etm->timeless_decoding)
 		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
 	else
@@ -2357,7 +2383,7 @@ static int cs_etm__process_event(struct perf_session *session,
 
 	if ((timestamp || etm->timeless_decoding)
 			&& event->header.type == PERF_RECORD_AUX) {
-		err = cs_etm__update_queues(etm, sample->cpu);
+		err = cs_etm__update_queues(etm, sample->cpu, &event->aux);
 		if (err)
 			return err;
 	}
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 4/7] perf cs-etm: don't process queues until cs_etm__flush_events
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
                   ` (2 preceding siblings ...)
  2021-02-12 14:45 ` [PATCH 3/7] perf cs-etm: Save aux records in each etm queue James Clark
@ 2021-02-12 14:45 ` James Clark
  2021-02-12 14:45 ` [PATCH 5/7] perf cs-etm: split decode by aux records James Clark
                   ` (4 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: James Clark @ 2021-02-12 14:45 UTC (permalink / raw)
  To: coresight
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, John Garry,
	linux-kernel, James Clark, Leo Yan, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

To make sure processing happens in the correct order, queue processing
shouldn't start until every aux queue has had its first timestamp found.

Now that we're only searching for timestamps within each aux record, we
need to wait until all aux records are delivered before starting the
processing.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 4 ----
 1 file changed, 4 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 88b541b2a804..5ab037c2dabe 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2398,10 +2398,6 @@ static int cs_etm__process_event(struct perf_session *session,
 	else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
 		return cs_etm__process_switch_cpu_wide(etm, event);
 
-	if (!etm->timeless_decoding &&
-	    event->header.type == PERF_RECORD_AUX)
-		return cs_etm__process_queues(etm);
-
 	return 0;
 }
 
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 5/7] perf cs-etm: split decode by aux records.
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
                   ` (3 preceding siblings ...)
  2021-02-12 14:45 ` [PATCH 4/7] perf cs-etm: don't process queues until cs_etm__flush_events James Clark
@ 2021-02-12 14:45 ` James Clark
  2021-02-12 14:45 ` [PATCH 6/7] perf cs-etm: Use existing decode code path for --dump-raw-trace James Clark
                   ` (3 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: James Clark @ 2021-02-12 14:45 UTC (permalink / raw)
  To: coresight
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, John Garry,
	linux-kernel, James Clark, Leo Yan, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

The trace data between aux records is not continuous, so the decoder
must be reset between each record to ensure that parsing happens
correctly and without any early exits.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 109 +++++++++++++++++++++++----------------
 1 file changed, 64 insertions(+), 45 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 5ab037c2dabe..3026fcf50b5d 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -95,6 +95,7 @@ struct cs_etm_queue {
 	int aux_record_list_len;
 	int aux_record_list_idx;
 	struct perf_record_aux *aux_record_list;
+	bool timestamp_found;
 };
 
 /* RB tree for quick conversion between traceID and metadata pointers */
@@ -784,6 +785,9 @@ static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
 	unsigned int cs_queue_nr;
 	u8 trace_chan_id;
 
+	if (etmq->timestamp_found)
+		return 0;
+
 	/*
 	 * We are under a CPU-wide trace scenario.  As such we need to know
 	 * when the code that generated the traces started to execute so that
@@ -792,56 +796,54 @@ static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
 	 * timestamp.  The timestamp is then added to the auxtrace min heap
 	 * in order to know what nibble (of all the etmqs) to decode first.
 	 */
-	while (1) {
-		/*
-		 * Fetch an aux_buffer from this etmq.  Bail if no more
-		 * blocks or an error has been encountered.
-		 */
-		ret = cs_etm__get_data_block(etmq);
-		if (ret <= 0)
-			return ret;
-
-		/*
-		 * Run decoder on the trace block.  The decoder will stop when
-		 * encountering a timestamp, a full packet queue or the end of
-		 * trace for that block.
-		 */
-		ret = cs_etm__decode_data_block(etmq);
-		if (ret)
-			return ret;
+	/*
+	 * Fetch an aux_buffer from this etmq.  Bail if no more
+	 * blocks or an error has been encountered.
+	 */
+	ret = cs_etm__get_data_block(etmq);
+	if (ret <= 0)
+		return ret;
 
-		/*
-		 * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all
-		 * the timestamp calculation for us.
-		 */
-		timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id);
+	/*
+	 * Run decoder on the trace block.  The decoder will stop when
+	 * encountering a timestamp, a full packet queue or the end of
+	 * trace for that block.
+	 */
+	ret = cs_etm__decode_data_block(etmq);
+	if (ret)
+		return ret;
 
-		/* We found a timestamp, no need to continue. */
-		if (timestamp)
-			break;
+	/*
+	 * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all
+	 * the timestamp calculation for us.
+	 */
+	timestamp = cs_etm__etmq_get_timestamp(etmq, &trace_chan_id);
 
+	/* We found a timestamp, no need to continue. */
+	if (timestamp) {
 		/*
-		 * We didn't find a timestamp so empty all the traceid packet
-		 * queues before looking for another timestamp packet, either
-		 * in the current data block or a new one.  Packets that were
-		 * just decoded are useless since no timestamp has been
-		 * associated with them.  As such simply discard them.
+		 * We have a timestamp.  Add it to the min heap to reflect when
+		 * instructions conveyed by the range packets of this traceID queue
+		 * started to execute.  Once the same has been done for all the traceID
+		 * queues of each etmq, redenring and decoding can start in
+		 * chronological order.
+		 *
+		 * Note that packets decoded above are still in the traceID's packet
+		 * queue and will be processed in cs_etm__process_queues().
 		 */
-		cs_etm__clear_all_packet_queues(etmq);
+		etmq->timestamp_found = true;
+		cs_queue_nr = TO_CS_QUEUE_NR(etmq->queue_nr, trace_chan_id);
+		return auxtrace_heap__add(&etmq->etm->heap, cs_queue_nr, timestamp);
 	}
-
 	/*
-	 * We have a timestamp.  Add it to the min heap to reflect when
-	 * instructions conveyed by the range packets of this traceID queue
-	 * started to execute.  Once the same has been done for all the traceID
-	 * queues of each etmq, redenring and decoding can start in
-	 * chronological order.
-	 *
-	 * Note that packets decoded above are still in the traceID's packet
-	 * queue and will be processed in cs_etm__process_queues().
+	 * We didn't find a timestamp so empty all the traceid packet
+	 * queues before looking for another timestamp packet, either
+	 * in the current data block or a new one.  Packets that were
+	 * just decoded are useless since no timestamp has been
+	 * associated with them.  As such simply discard them.
 	 */
-	cs_queue_nr = TO_CS_QUEUE_NR(etmq->queue_nr, trace_chan_id);
-	return auxtrace_heap__add(&etmq->etm->heap, cs_queue_nr, timestamp);
+	cs_etm__clear_all_packet_queues(etmq);
+	return 0;
 }
 
 static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
@@ -1645,6 +1647,13 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq)
 {
 	int ret;
 
+	if (etmq->aux_record_list[etmq->aux_record_list_idx].aux_size <= 0) {
+		etmq->aux_record_list_idx++;
+		ret = cs_etm_decoder__reset(etmq->decoder);
+		if (ret)
+			return ret;
+	}
+
 	if (!etmq->buf_len) {
 		ret = cs_etm__get_trace(etmq);
 		if (ret <= 0)
@@ -2016,6 +2025,15 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq)
 {
 	int ret = 0;
 	size_t processed = 0;
+	u64 decode_size;
+
+	if (etmq->aux_record_list_idx >= etmq->aux_record_list_len ||
+		etmq->aux_record_list[etmq->aux_record_list_idx].aux_size > etmq->buf_len) {
+		// Assume that aux records always equally divide up the aux buffer
+		// so aux_size should never exceed the remaining buffer to decode.
+		ret = -1;
+		goto out;
+	}
 
 	/*
 	 * Packets are decoded and added to the decoder's packet queue
@@ -2024,10 +2042,11 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq)
 	 * operations that stop processing are a timestamp packet or a full
 	 * decoder buffer queue.
 	 */
+	decode_size = etmq->aux_record_list[etmq->aux_record_list_idx].aux_size;
 	ret = cs_etm_decoder__process_data_block(etmq->decoder,
 						 etmq->offset,
 						 &etmq->buf[etmq->buf_used],
-						 etmq->buf_len,
+						 decode_size,
 						 &processed);
 	if (ret)
 		goto out;
@@ -2035,7 +2054,7 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq)
 	etmq->offset += processed;
 	etmq->buf_used += processed;
 	etmq->buf_len -= processed;
-
+	etmq->aux_record_list[etmq->aux_record_list_idx].aux_size -= processed;
 out:
 	return ret;
 }
@@ -2160,7 +2179,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 			 */
 			err = cs_etm__process_traceid_queue(etmq, tidq);
 
-		} while (etmq->buf_len);
+		} while (etmq->aux_record_list[etmq->aux_record_list_idx].aux_size > 0);
 
 		if (err == 0)
 			/* Flush any remaining branch stack entries */
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 6/7] perf cs-etm: Use existing decode code path for --dump-raw-trace
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
                   ` (4 preceding siblings ...)
  2021-02-12 14:45 ` [PATCH 5/7] perf cs-etm: split decode by aux records James Clark
@ 2021-02-12 14:45 ` James Clark
  2021-02-12 14:45 ` [PATCH 7/7] perf cs-etm: Suppress printing when resetting decoder James Clark
                   ` (2 subsequent siblings)
  8 siblings, 0 replies; 19+ messages in thread
From: James Clark @ 2021-02-12 14:45 UTC (permalink / raw)
  To: coresight
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, John Garry,
	linux-kernel, James Clark, Leo Yan, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

Previously the dump mode created a new decoder for each buffer
and had a different decode loop to the main code paths.

This change uses more of the existing code path which tracks
aux records and resets the decoder between each one. Unfortunately
the decoder is quite noisy when being reset, so printing has to
be suppressed around each call to reset.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 91 ++++++++++++++++------------------------
 1 file changed, 36 insertions(+), 55 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 3026fcf50b5d..dc4885794859 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -474,56 +474,21 @@ static int cs_etm__init_decoder_params(struct cs_etm_decoder_params *d_params,
 	return ret;
 }
 
-static void cs_etm__dump_event(struct cs_etm_auxtrace *etm,
-			       struct auxtrace_buffer *buffer)
+static void cs_etm__dump_event(struct cs_etm_queue *etmq)
 {
-	int ret;
 	const char *color = PERF_COLOR_BLUE;
-	struct cs_etm_decoder_params d_params;
-	struct cs_etm_trace_params *t_params;
-	struct cs_etm_decoder *decoder;
-	size_t buffer_used = 0;
+
+	if (etmq->aux_record_list_idx >= etmq->aux_record_list_len)
+		return;
 
 	fprintf(stdout, "\n");
 	color_fprintf(stdout, color,
 		     ". ... CoreSight ETM Trace data: size %zu bytes\n",
-		     buffer->size);
-
-	/* Use metadata to fill in trace parameters for trace decoder */
-	t_params = zalloc(sizeof(*t_params) * etm->num_cpu);
-
-	if (!t_params)
-		return;
-
-	if (cs_etm__init_trace_params(t_params, etm))
-		goto out_free;
-
-	/* Set decoder parameters to simply print the trace packets */
-	if (cs_etm__init_decoder_params(&d_params, NULL,
-					CS_ETM_OPERATION_PRINT))
-		goto out_free;
-
-	decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params);
+		     etmq->aux_record_list[etmq->aux_record_list_idx].aux_size);
 
-	if (!decoder)
-		goto out_free;
-	do {
-		size_t consumed;
-
-		ret = cs_etm_decoder__process_data_block(
-				decoder, buffer->offset,
-				&((u8 *)buffer->data)[buffer_used],
-				buffer->size - buffer_used, &consumed);
-		if (ret)
+	while (1)
+		if (cs_etm__decode_data_block(etmq) <= 0)
 			break;
-
-		buffer_used += consumed;
-	} while (buffer_used < buffer->size);
-
-	cs_etm_decoder__free(decoder);
-
-out_free:
-	zfree(&t_params);
 }
 
 static int cs_etm__flush_events(struct perf_session *session,
@@ -735,9 +700,15 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm)
 		goto out_free;
 
 	/* Set decoder parameters to decode trace packets */
-	if (cs_etm__init_decoder_params(&d_params, etmq,
-					CS_ETM_OPERATION_DECODE))
-		goto out_free;
+	if (dump_trace) {
+		if (cs_etm__init_decoder_params(&d_params, NULL,
+					CS_ETM_OPERATION_PRINT))
+			goto out_free;
+	} else {
+		if (cs_etm__init_decoder_params(&d_params, etmq,
+						CS_ETM_OPERATION_DECODE))
+			goto out_free;
+	}
 
 	etmq->decoder = cs_etm_decoder__new(etm->num_cpu, &d_params, t_params);
 
@@ -903,7 +874,7 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct pe
 	if (ret)
 		return ret;
 
-	if (!etm->timeless_decoding)
+	if (!etm->timeless_decoding && !dump_trace)
 		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
 	else
 		return 0;
@@ -1647,6 +1618,9 @@ static int cs_etm__get_data_block(struct cs_etm_queue *etmq)
 {
 	int ret;
 
+	if (etmq->aux_record_list_idx >= etmq->aux_record_list_len)
+		return 0;
+
 	if (etmq->aux_record_list[etmq->aux_record_list_idx].aux_size <= 0) {
 		etmq->aux_record_list_idx++;
 		ret = cs_etm_decoder__reset(etmq->decoder);
@@ -2387,10 +2361,7 @@ static int cs_etm__process_event(struct perf_session *session,
 						   struct cs_etm_auxtrace,
 						   auxtrace);
 
-	if (dump_trace)
-		return 0;
-
-	if (!tool->ordered_events) {
+	if (!tool->ordered_events && !dump_trace) {
 		pr_err("CoreSight ETM Trace requires ordered events\n");
 		return -EINVAL;
 	}
@@ -2407,6 +2378,9 @@ static int cs_etm__process_event(struct perf_session *session,
 			return err;
 	}
 
+	if (dump_trace)
+		return 0;
+
 	if (etm->timeless_decoding &&
 	    event->header.type == PERF_RECORD_EXIT)
 		return cs_etm__process_timeless_queues(etm,
@@ -2424,6 +2398,9 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
 					  union perf_event *event,
 					  struct perf_tool *tool __maybe_unused)
 {
+	unsigned int i;
+	struct auxtrace_queues *queues;
+	struct cs_etm_queue *etmq;
 	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
 						   struct cs_etm_auxtrace,
 						   auxtrace);
@@ -2446,12 +2423,17 @@ static int cs_etm__process_auxtrace_event(struct perf_session *session,
 						 event, data_offset, &buffer);
 		if (err)
 			return err;
+	}
 
-		if (dump_trace)
-			if (auxtrace_buffer__get_data(buffer, fd)) {
-				cs_etm__dump_event(etm, buffer);
-				auxtrace_buffer__put_data(buffer);
+	if (dump_trace) {
+		queues = &etm->queues;
+		for (i = 0; i < queues->nr_queues; i++) {
+			etmq = queues->queue_array[i].priv;
+			if (etmq) {
+				while (cs_etm__get_data_block(etmq) > 0)
+					cs_etm__dump_event(etmq);
 			}
+		}
 	}
 
 	return 0;
@@ -2705,7 +2687,6 @@ int cs_etm__process_auxtrace_info(union perf_event *event,
 
 	if (dump_trace) {
 		cs_etm__print_auxtrace_info(auxtrace_info->priv, num_cpu);
-		return 0;
 	}
 
 	if (session->itrace_synth_opts->set) {
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* [PATCH 7/7] perf cs-etm: Suppress printing when resetting decoder
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
                   ` (5 preceding siblings ...)
  2021-02-12 14:45 ` [PATCH 6/7] perf cs-etm: Use existing decode code path for --dump-raw-trace James Clark
@ 2021-02-12 14:45 ` James Clark
  2021-02-24 16:13 ` [PATCH 0/7] Split Coresight decode by aux records Mathieu Poirier
  2021-04-15 20:37 ` Mathieu Poirier
  8 siblings, 0 replies; 19+ messages in thread
From: James Clark @ 2021-02-12 14:45 UTC (permalink / raw)
  To: coresight
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, John Garry,
	linux-kernel, James Clark, Leo Yan, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

The decoder is quite noisy when being reset. Now that dump-raw-trace
uses a code path that resets the decoder rather than creating a new
one, printing has to be suppressed to not flood the output.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm-decoder/cs-etm-decoder.c | 10 +++++++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
index 3f4bc4050477..e0d530d94e1e 100644
--- a/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
+++ b/tools/perf/util/cs-etm-decoder/cs-etm-decoder.c
@@ -32,6 +32,7 @@
 struct cs_etm_decoder {
 	void *data;
 	void (*packet_printer)(const char *msg);
+	bool suppress_printing;
 	dcd_tree_handle_t dcd_tree;
 	cs_etm_mem_cb_type mem_access;
 	ocsd_datapath_resp_t prev_return;
@@ -71,9 +72,10 @@ int cs_etm_decoder__reset(struct cs_etm_decoder *decoder)
 	ocsd_datapath_resp_t dp_ret;
 
 	decoder->prev_return = OCSD_RESP_CONT;
-
+	decoder->suppress_printing = true;
 	dp_ret = ocsd_dt_process_data(decoder->dcd_tree, OCSD_OP_RESET,
 				      0, 0, NULL, NULL);
+	decoder->suppress_printing = false;
 	if (OCSD_DATA_RESP_IS_FATAL(dp_ret))
 		return -1;
 
@@ -143,8 +145,10 @@ static void cs_etm_decoder__print_str_cb(const void *p_context,
 					 const char *msg,
 					 const int str_len)
 {
-	if (p_context && str_len)
-		((struct cs_etm_decoder *)p_context)->packet_printer(msg);
+	const struct cs_etm_decoder *decoder = p_context;
+
+	if (p_context && str_len && !decoder->suppress_printing)
+		decoder->packet_printer(msg);
 }
 
 static int
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 1/7] perf cs-etm: Split up etm queue setup function
  2021-02-12 14:45 ` [PATCH 1/7] perf cs-etm: Split up etm queue setup function James Clark
@ 2021-02-20  8:11   ` Leo Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2021-02-20  8:11 UTC (permalink / raw)
  To: James Clark
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, coresight,
	John Garry, linux-kernel, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

Hi James,

On Fri, Feb 12, 2021 at 04:45:07PM +0200, James Clark wrote:
> Refactor the function into separate allocation and
> timestamp search parts. Later the timestamp search
> will be done multiple times.

The new introduced function cs_etm__search_first_timestamp() is to
search timestamp; if it cannot find any valid timestamp, it will
drop all packets for every queue with the logic:

  cs_etm__search_first_timestamp()
  {
     timestamp = cs_etm__etmq_get_timestamp();

     if (timestamp)
       return auxtrace_heap__add();  -> heapify timestamp

     cs_etm__clear_all_packet_queues();  -> clear all packets
     return 0;
  }

If the function cs_etm__search_first_timestamp() is invoked for multiple
times, is it possible to clear all packets in the middle of decoding?

From my understanding, it makes sense to drop the trace data at the
very early decoding so that can get the timestamp for packets,
afterwards the packets should always have valid timestamp?  If so, I
don't think it's necessary to contain the function
cs_etm__clear_all_packet_queues() in cs_etm__search_first_timestamp().

I will go back to check this conclusion is correct or not after I
have better understanding for the whole patch set.  Just note here.

Thanks,
Leo

> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 60 +++++++++++++++++++++-------------------
>  1 file changed, 31 insertions(+), 29 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index a2a369e2fbb6..27894facae5e 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -765,33 +765,12 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm)
>  	return NULL;
>  }
>  
> -static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
> -			       struct auxtrace_queue *queue,
> -			       unsigned int queue_nr)
> +static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
>  {
>  	int ret = 0;
> +	u64 timestamp;
>  	unsigned int cs_queue_nr;
>  	u8 trace_chan_id;
> -	u64 timestamp;
> -	struct cs_etm_queue *etmq = queue->priv;
> -
> -	if (list_empty(&queue->head) || etmq)
> -		goto out;
> -
> -	etmq = cs_etm__alloc_queue(etm);
> -
> -	if (!etmq) {
> -		ret = -ENOMEM;
> -		goto out;
> -	}
> -
> -	queue->priv = etmq;
> -	etmq->etm = etm;
> -	etmq->queue_nr = queue_nr;
> -	etmq->offset = 0;
> -
> -	if (etm->timeless_decoding)
> -		goto out;
>  
>  	/*
>  	 * We are under a CPU-wide trace scenario.  As such we need to know
> @@ -808,7 +787,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
>  		 */
>  		ret = cs_etm__get_data_block(etmq);
>  		if (ret <= 0)
> -			goto out;
> +			return ret;
>  
>  		/*
>  		 * Run decoder on the trace block.  The decoder will stop when
> @@ -817,7 +796,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
>  		 */
>  		ret = cs_etm__decode_data_block(etmq);
>  		if (ret)
> -			goto out;
> +			return ret;
>  
>  		/*
>  		 * Function cs_etm_decoder__do_{hard|soft}_timestamp() does all
> @@ -849,10 +828,33 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
>  	 * Note that packets decoded above are still in the traceID's packet
>  	 * queue and will be processed in cs_etm__process_queues().
>  	 */
> -	cs_queue_nr = TO_CS_QUEUE_NR(queue_nr, trace_chan_id);
> -	ret = auxtrace_heap__add(&etm->heap, cs_queue_nr, timestamp);
> -out:
> -	return ret;
> +	cs_queue_nr = TO_CS_QUEUE_NR(etmq->queue_nr, trace_chan_id);
> +	return auxtrace_heap__add(&etmq->etm->heap, cs_queue_nr, timestamp);
> +}
> +
> +static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
> +			       struct auxtrace_queue *queue,
> +			       unsigned int queue_nr)
> +{
> +	struct cs_etm_queue *etmq = queue->priv;
> +
> +	if (list_empty(&queue->head) || etmq)
> +		return 0;
> +
> +	etmq = cs_etm__alloc_queue(etm);
> +
> +	if (!etmq)
> +		return -ENOMEM;
> +
> +	queue->priv = etmq;
> +	etmq->etm = etm;
> +	etmq->queue_nr = queue_nr;
> +	etmq->offset = 0;
> +
> +	if (etm->timeless_decoding)
> +		return 0;
> +	else
> +		return cs_etm__search_first_timestamp(etmq);
>  }
>  
>  static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.
  2021-02-12 14:45 ` [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue James Clark
@ 2021-02-20 11:50   ` Leo Yan
  2021-03-01 15:28     ` James Clark
  2021-05-06 10:45   ` James Clark
  1 sibling, 1 reply; 19+ messages in thread
From: Leo Yan @ 2021-02-20 11:50 UTC (permalink / raw)
  To: James Clark
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, coresight,
	John Garry, linux-kernel, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

On Fri, Feb 12, 2021 at 04:45:08PM +0200, James Clark wrote:
> Change initial timestamp search to only operate on the queue
> related to the current event. In a later change the bounds
> of the aux record will also be used to reset the decoder and
> the record is only relevant to a single queue.

I roughly understand this patch tries to establish the mechanism for
timstamp search per CPU, but I am struggling to understand what's issue
you try to address.

So could you give more description for "what's current issue?" and
"why need to use this way to fix the issue?".  This would be very
appreciated.

> This change makes some files that had coresight data
> but didn't syntesise any events start working and generating
> events. I'm not sure of the reason for that. I'd expect this
> change to only affect the ordering of events.

This seems to me that this patch introduces regression.

> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 27894facae5e..8f8b448632fb 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -97,7 +97,7 @@ struct cs_etm_queue {
>  /* RB tree for quick conversion between traceID and metadata pointers */
>  static struct intlist *traceid_list;
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
>  static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
>  static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
>  					   pid_t tid);
> @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm,
>  static int cs_etm__flush_events(struct perf_session *session,
>  				struct perf_tool *tool)
>  {
> -	int ret;
>  	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
>  						   struct cs_etm_auxtrace,
>  						   auxtrace);
> @@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session,
>  	if (!tool->ordered_events)
>  		return -EINVAL;
>  
> -	ret = cs_etm__update_queues(etm);
> -
> -	if (ret < 0)
> -		return ret;
> -

When flush events, it means the trace data is discontinuous or at the
end of trace data.  If the trace data is discontinuous, we need to use
cs_etm__update_queues() to create new queues.  So if we remove the
calling cs_etm__update_queues(), I suspect it cannot handle the
discontinuous trace data anymore.

>  	if (etm->timeless_decoding)
>  		return cs_etm__process_timeless_queues(etm, -1);
>  
> @@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
>  	etmq->queue_nr = queue_nr;
>  	etmq->offset = 0;
>  
> -	if (etm->timeless_decoding)
> -		return 0;
> -	else
> -		return cs_etm__search_first_timestamp(etmq);
> +	return 0;
>  }
>  
>  static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
> @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
>  	return 0;
>  }
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
>  {
> +	int ret;
>  	if (etm->queues.new_data) {
>  		etm->queues.new_data = false;
> -		return cs_etm__setup_queues(etm);
> +		ret = cs_etm__setup_queues(etm);

Just remind, the new parameter "cpu" is introduced in this function,
in theory, cs_etm__update_queues() should work for the specified CPU.
But it always setup queues for all CPUs with calling
cs_etm__setup_queues().

> +		if (ret)
> +			return ret;
>  	}
>  
> -	return 0;
> +	if (!etm->timeless_decoding)
> +		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
> +	else
> +		return 0;

In the original code of cs_etm__update_queues(), if there have no any
new data (or has already setup queues), then it does nothing and
directly bails out.

After applied the up change, it will always search the first timestamp
for the "cpu".

>  }
>  
>  static inline
> @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session,
>  	else
>  		timestamp = 0;
>  
> -	if (timestamp || etm->timeless_decoding) {
> -		err = cs_etm__update_queues(etm);
> +	if ((timestamp || etm->timeless_decoding)
> +			&& event->header.type == PERF_RECORD_AUX) {
> +		err = cs_etm__update_queues(etm, sample->cpu);

Here might cause potential issue.  Let's see an example:

If CoreSight uses N:1 model for tracers and sink (just like multiple
ETMs output trace to the same ETR), when "sample->cpu" is 0, it will
only initialize the timestamp for CPU 0.  In the same AUX trace data,
it can contains trace for CPU 1, 2, etc; for this case, the flow
doesn't prepare the timestamp for CPU1/2, so it will fail to generate
samples for CPU 1/2, right?

Thanks,
Leo

>  		if (err)
>  			return err;
>  	}
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] Split Coresight decode by aux records
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
                   ` (6 preceding siblings ...)
  2021-02-12 14:45 ` [PATCH 7/7] perf cs-etm: Suppress printing when resetting decoder James Clark
@ 2021-02-24 16:13 ` Mathieu Poirier
  2021-03-01 14:05   ` James Clark
  2021-04-15 20:37 ` Mathieu Poirier
  8 siblings, 1 reply; 19+ messages in thread
From: Mathieu Poirier @ 2021-02-24 16:13 UTC (permalink / raw)
  To: James Clark
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, suzuki.poulose,
	Alexander Shishkin, Will Deacon, coresight, John Garry,
	linux-kernel, Leo Yan, Namhyung Kim, Jiri Olsa, linux-arm-kernel,
	Mike Leach

Good day James,

I have received your patchset and added it to my queue.  On the flip side it
will be 3 to 4 weeks (from today) before I get a chance to look at it.  As such
I suggest you don't wait on me before addressing the issues found by Leo.

Thanks,
Mathieu

On Fri, Feb 12, 2021 at 04:45:06PM +0200, James Clark wrote:
> Hi All,
> 
> Since my previous RFC, I've fixed --per-thread mode and solved
> most of the open questions. I've also changed --dump-raw-trace
> to use the same code path so it's also working now.
> 
> I think the only open questions are:
>   * General approach
>   * If aux records need to be saved, or if they can be pulled
>     from elsewhere.
> 
> I've also tested perf inject which is now working with troublesome
> files.
> 
> Thanks
> James
> 
> James Clark (7):
>   perf cs-etm: Split up etm queue setup function
>   perf cs-etm: Only search timestamp in current sample's queue.
>   perf cs-etm: Save aux records in each etm queue
>   perf cs-etm: don't process queues until cs_etm__flush_events
>   perf cs-etm: split decode by aux records.
>   perf cs-etm: Use existing decode code path for --dump-raw-trace
>   perf cs-etm: Suppress printing when resetting decoder
> 
>  .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  10 +-
>  tools/perf/util/cs-etm.c                      | 300 ++++++++++--------
>  2 files changed, 168 insertions(+), 142 deletions(-)
> 
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue
  2021-02-12 14:45 ` [PATCH 3/7] perf cs-etm: Save aux records in each etm queue James Clark
@ 2021-02-27  7:10   ` Leo Yan
  2021-03-01 15:43     ` James Clark
  0 siblings, 1 reply; 19+ messages in thread
From: Leo Yan @ 2021-02-27  7:10 UTC (permalink / raw)
  To: James Clark
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, coresight,
	John Garry, linux-kernel, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach

On Fri, Feb 12, 2021 at 04:45:09PM +0200, James Clark wrote:
> The aux records will be used set the bounds of decoding in a
> later commit. In the future we may also want to use the flags
> of each record to control decoding.
> 
> Do these need to be saved in their entirety, or can pointers
> to each record safely be saved instead for later access?

Rather than introudcing the perf record list, I just wander if we can
use easier method to fix this problem.  So below is the rough idea
(though I don't really verify it):

The essential information we need is what's the valid buffer length can
be used for decoding.  Though cs_etm_queue::buf_len tracks the buffer
length, but it's the buffer length is for the whole AUX buffer, and
which belongs to multiple "PERF_RECORD_AUX" events.  So we cannot decode
at once for the whole trace data in the AUX trace buffer, on the other
hand, the incoming "PERF_RECORD_AUX" event can guide the CoreSight
decoder it should decode how much buffer size.  At the end, the trace
data can be decoded step by step based on the incoming "PERF_RECORD_AUX"
events.

I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it
stands for the record length based on the RECORD_AUX event.  In
theory, this value should be always less than "cs_etm_queue::buf_len".

When every time the "PERF_RECORD_AUX" event is coming, we find out the
corresponding queue (so this can be applied for "1:1" or "N:1" models
for source and sink), and accumulate "perf_record_aux::aux_size" into
"cs_etm_queue::buf_rec_len".

At the decoder side, it decreases "etmq->buf_rec_len" until to zero for
the current round of decoding (see cs_etm__decode_data_block()).  Since
all the "PERF_RECORD_AUX" event will be processed before
"PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be
ignored.

The main reason for this suggestion is it don't need to change the
significant logic in current code.  I will try to do experiment for this
idea and share back.

James, if you think I miss anything, please correct me as needed.
Thanks!

Leo

> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 8f8b448632fb..88b541b2a804 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -92,12 +92,16 @@ struct cs_etm_queue {
>  	/* Conversion between traceID and index in traceid_queues array */
>  	struct intlist *traceid_queues_list;
>  	struct cs_etm_traceid_queue **traceid_queues;
> +	int aux_record_list_len;
> +	int aux_record_list_idx;
> +	struct perf_record_aux *aux_record_list;
>  };
>  
>  /* RB tree for quick conversion between traceID and metadata pointers */
>  static struct intlist *traceid_list;
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu,
> +				 struct perf_record_aux *aux_record);
>  static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
>  static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
>  					   pid_t tid);
> @@ -585,6 +589,7 @@ static void cs_etm__free_queue(void *priv)
>  
>  	cs_etm_decoder__free(etmq->decoder);
>  	cs_etm__free_traceid_queues(etmq);
> +	free(etmq->aux_record_list);
>  	free(etmq);
>  }
>  
> @@ -759,6 +764,19 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm)
>  	return NULL;
>  }
>  
> +static int cs_etm__save_aux_record(struct cs_etm_queue *etmq,
> +				   struct perf_record_aux *aux_record)
> +{
> +	etmq->aux_record_list = reallocarray(etmq->aux_record_list,
> +					      etmq->aux_record_list_len+1,
> +					      sizeof(*etmq->aux_record_list));
> +	if (!etmq->aux_record_list)
> +		return -ENOMEM;
> +
> +	etmq->aux_record_list[etmq->aux_record_list_len++] = *aux_record;
> +	return 0;
> +}
> +
>  static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
>  {
>  	int ret = 0;
> @@ -865,7 +883,7 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
>  	return 0;
>  }
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct perf_record_aux *aux)
>  {
>  	int ret;
>  	if (etm->queues.new_data) {
> @@ -875,6 +893,14 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
>  			return ret;
>  	}
>  
> +	/* In timeless mode, cpu is set to -1, and a single aux buffer is filled */
> +	if (cpu < 0)
> +		cpu = 0;
> +
> +	ret = cs_etm__save_aux_record(etm->queues.queue_array[cpu].priv, aux);
> +	if (ret)
> +		return ret;
> +
>  	if (!etm->timeless_decoding)
>  		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
>  	else
> @@ -2357,7 +2383,7 @@ static int cs_etm__process_event(struct perf_session *session,
>  
>  	if ((timestamp || etm->timeless_decoding)
>  			&& event->header.type == PERF_RECORD_AUX) {
> -		err = cs_etm__update_queues(etm, sample->cpu);
> +		err = cs_etm__update_queues(etm, sample->cpu, &event->aux);
>  		if (err)
>  			return err;
>  	}
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] Split Coresight decode by aux records
  2021-02-24 16:13 ` [PATCH 0/7] Split Coresight decode by aux records Mathieu Poirier
@ 2021-03-01 14:05   ` James Clark
  0 siblings, 0 replies; 19+ messages in thread
From: James Clark @ 2021-03-01 14:05 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, suzuki.poulose,
	Alexander Shishkin, Will Deacon, coresight, John Garry,
	linux-kernel, Leo Yan, Namhyung Kim, Jiri Olsa, linux-arm-kernel,
	Mike Leach


On 24/02/2021 18:13, Mathieu Poirier wrote:
> Good day James,
> 
> I have received your patchset and added it to my queue.  On the flip side it
> will be 3 to 4 weeks (from today) before I get a chance to look at it.  As such
> I suggest you don't wait on me before addressing the issues found by Leo.
> 

Ok, thanks Mathieu. I found that it's only working in --per-thread mode by
coincidence of my input file. So I would suggest to not look too thoroughly
until I have submitted v2. It should also probably still be an RFC rather than PATCH.

Thanks
James

> Thanks,
> Mathieu
> 
> On Fri, Feb 12, 2021 at 04:45:06PM +0200, James Clark wrote:
>> Hi All,
>>
>> Since my previous RFC, I've fixed --per-thread mode and solved
>> most of the open questions. I've also changed --dump-raw-trace
>> to use the same code path so it's also working now.
>>
>> I think the only open questions are:
>>   * General approach
>>   * If aux records need to be saved, or if they can be pulled
>>     from elsewhere.
>>
>> I've also tested perf inject which is now working with troublesome
>> files.
>>
>> Thanks
>> James
>>
>> James Clark (7):
>>   perf cs-etm: Split up etm queue setup function
>>   perf cs-etm: Only search timestamp in current sample's queue.
>>   perf cs-etm: Save aux records in each etm queue
>>   perf cs-etm: don't process queues until cs_etm__flush_events
>>   perf cs-etm: split decode by aux records.
>>   perf cs-etm: Use existing decode code path for --dump-raw-trace
>>   perf cs-etm: Suppress printing when resetting decoder
>>
>>  .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  10 +-
>>  tools/perf/util/cs-etm.c                      | 300 ++++++++++--------
>>  2 files changed, 168 insertions(+), 142 deletions(-)
>>
>> -- 
>> 2.28.0
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.
  2021-02-20 11:50   ` Leo Yan
@ 2021-03-01 15:28     ` James Clark
  2021-03-02 11:52       ` Leo Yan
  0 siblings, 1 reply; 19+ messages in thread
From: James Clark @ 2021-03-01 15:28 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, coresight,
	John Garry, linux-kernel, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach



On 20/02/2021 13:50, Leo Yan wrote:
> On Fri, Feb 12, 2021 at 04:45:08PM +0200, James Clark wrote:
>> Change initial timestamp search to only operate on the queue
>> related to the current event. In a later change the bounds
>> of the aux record will also be used to reset the decoder and
>> the record is only relevant to a single queue.
> 
> I roughly understand this patch tries to establish the mechanism for
> timstamp search per CPU, but I am struggling to understand what's issue
> you try to address.
> 
> So could you give more description for "what's current issue?" and
> "why need to use this way to fix the issue?".  This would be very
> appreciated.

Hi Leo,

The issue is that the aux records used to reset the decoder are associated
with a specific CPU/aux queue. Currently when any new data is received, all
queues are searched for a timestamp. We can't do it that way any more because
the aux records aren't available yet.

The reason to fix it this way is because now we can only do decode when
an aux record is received. This will happen multiple times, and will also
be cpu/queue specific.
 
> 
>> This change makes some files that had coresight data
>> but didn't syntesise any events start working and generating
>> events. I'm not sure of the reason for that. I'd expect this
>> change to only affect the ordering of events.
> 
> This seems to me that this patch introduces regression.

I'm wondering if it is a regression, or accidentally fixing a bug.
It doesn't seem like it's possible to go from not generating any samples to
generating lots without accidentally fixing an existing issue. If there is
valid data there, what would be stopping it from generating any samples?

I do need to look into this more closely though to find the real reason for
it, which will probably shed more light on it.

> 
>> Signed-off-by: James Clark <james.clark@arm.com>
>> ---
>>  tools/perf/util/cs-etm.c | 30 ++++++++++++++----------------
>>  1 file changed, 14 insertions(+), 16 deletions(-)
>>
>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>> index 27894facae5e..8f8b448632fb 100644
>> --- a/tools/perf/util/cs-etm.c
>> +++ b/tools/perf/util/cs-etm.c
>> @@ -97,7 +97,7 @@ struct cs_etm_queue {
>>  /* RB tree for quick conversion between traceID and metadata pointers */
>>  static struct intlist *traceid_list;
>>  
>> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
>> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
>>  static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
>>  static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
>>  					   pid_t tid);
>> @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm,
>>  static int cs_etm__flush_events(struct perf_session *session,
>>  				struct perf_tool *tool)
>>  {
>> -	int ret;
>>  	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
>>  						   struct cs_etm_auxtrace,
>>  						   auxtrace);
>> @@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session,
>>  	if (!tool->ordered_events)
>>  		return -EINVAL;
>>  
>> -	ret = cs_etm__update_queues(etm);
>> -
>> -	if (ret < 0)
>> -		return ret;
>> -
> 
> When flush events, it means the trace data is discontinuous or at the
> end of trace data.  If the trace data is discontinuous, we need to use
> cs_etm__update_queues() to create new queues.  So if we remove the
> calling cs_etm__update_queues(), I suspect it cannot handle the
> discontinuous trace data anymore.

Do you know how to force perf to record data like this? From my experience
etm->queues.new_data is only set once when the file is first opened.

> 
>>  	if (etm->timeless_decoding)
>>  		return cs_etm__process_timeless_queues(etm, -1);
>>  
>> @@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
>>  	etmq->queue_nr = queue_nr;
>>  	etmq->offset = 0;
>>  
>> -	if (etm->timeless_decoding)
>> -		return 0;
>> -	else
>> -		return cs_etm__search_first_timestamp(etmq);
>> +	return 0;
>>  }
>>  
>>  static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
>> @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
>>  	return 0;
>>  }
>>  
>> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
>> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
>>  {
>> +	int ret;
>>  	if (etm->queues.new_data) {
>>  		etm->queues.new_data = false;
>> -		return cs_etm__setup_queues(etm);
>> +		ret = cs_etm__setup_queues(etm);
> 
> Just remind, the new parameter "cpu" is introduced in this function,
> in theory, cs_etm__update_queues() should work for the specified CPU.
> But it always setup queues for all CPUs with calling
> cs_etm__setup_queues().
> 
>> +		if (ret)
>> +			return ret;
>>  	}
>>  
>> -	return 0;
>> +	if (!etm->timeless_decoding)
>> +		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
>> +	else
>> +		return 0;
> 
> In the original code of cs_etm__update_queues(), if there have no any
> new data (or has already setup queues), then it does nothing and
> directly bails out.
> 
> After applied the up change, it will always search the first timestamp
> for the "cpu".
> 

Maybe I need to pull my later change into here that makes it skip the search after
the timestamp is found for the first time. What do you think?

>>  }
>>  
>>  static inline
>> @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session,
>>  	else
>>  		timestamp = 0;
>>  
>> -	if (timestamp || etm->timeless_decoding) {
>> -		err = cs_etm__update_queues(etm);
>> +	if ((timestamp || etm->timeless_decoding)
>> +			&& event->header.type == PERF_RECORD_AUX) {
>> +		err = cs_etm__update_queues(etm, sample->cpu);
> 
> Here might cause potential issue.  Let's see an example:
> 
> If CoreSight uses N:1 model for tracers and sink (just like multiple
> ETMs output trace to the same ETR), when "sample->cpu" is 0, it will
> only initialize the timestamp for CPU 0.  In the same AUX trace data,
> it can contains trace for CPU 1, 2, etc; for this case, the flow
> doesn't prepare the timestamp for CPU1/2, so it will fail to generate
> samples for CPU 1/2, right?

If there is only one sink then wouldn't there also only be only one aux queue
and aux records generated for one CPU? I thought each channel in the data
corresponded to a cs_etm_traceid_queue and that is how multiple tracers worked?
But the channel corresponds to a thread rather than a CPU because the struct
looks like this:

    struct cs_etm_traceid_queue {
	    u8 trace_chan_id;
	    pid_t pid, tid;
            ...

There is currently an issue with per-thread mode where AUX events don't have the
CPU ID set and I've assumed CPU 0 which is incorrect. Maybe that is related to
your potential issue as well.

Thanks,
James

> 
> Thanks,
> Leo
> 
>>  		if (err)
>>  			return err;
>>  	}
>> -- 
>> 2.28.0
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue
  2021-02-27  7:10   ` Leo Yan
@ 2021-03-01 15:43     ` James Clark
  2021-03-02 12:03       ` Leo Yan
  0 siblings, 1 reply; 19+ messages in thread
From: James Clark @ 2021-03-01 15:43 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mark Rutland, branislav.rankov, al.grant, denik, Mathieu Poirier,
	suzuki.poulose, Alexander Shishkin, Will Deacon, coresight,
	John Garry, linux-kernel, Namhyung Kim, Jiri Olsa,
	linux-arm-kernel, Mike Leach



On 27/02/2021 09:10, Leo Yan wrote:
> On Fri, Feb 12, 2021 at 04:45:09PM +0200, James Clark wrote:
>> The aux records will be used set the bounds of decoding in a
>> later commit. In the future we may also want to use the flags
>> of each record to control decoding.
>>
>> Do these need to be saved in their entirety, or can pointers
>> to each record safely be saved instead for later access?
> 
> Rather than introudcing the perf record list, I just wander if we can
> use easier method to fix this problem.  So below is the rough idea
> (though I don't really verify it):
> 
> The essential information we need is what's the valid buffer length can
> be used for decoding.  Though cs_etm_queue::buf_len tracks the buffer
> length, but it's the buffer length is for the whole AUX buffer, and
> which belongs to multiple "PERF_RECORD_AUX" events.  So we cannot decode
> at once for the whole trace data in the AUX trace buffer, on the other
> hand, the incoming "PERF_RECORD_AUX" event can guide the CoreSight
> decoder it should decode how much buffer size.  At the end, the trace
> data can be decoded step by step based on the incoming "PERF_RECORD_AUX"
> events.
> 
> I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it
> stands for the record length based on the RECORD_AUX event.  In
> theory, this value should be always less than "cs_etm_queue::buf_len".
> 
> When every time the "PERF_RECORD_AUX" event is coming, we find out the
> corresponding queue (so this can be applied for "1:1" or "N:1" models
> for source and sink), and accumulate "perf_record_aux::aux_size" into
> "cs_etm_queue::buf_rec_len".
> 
> At the decoder side, it decreases "etmq->buf_rec_len" until to zero for
> the current round of decoding (see cs_etm__decode_data_block()).  Since
> all the "PERF_RECORD_AUX" event will be processed before
> "PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be
> ignored.
> 
> The main reason for this suggestion is it don't need to change the
> significant logic in current code.  I will try to do experiment for this
> idea and share back.
> 
> James, if you think I miss anything, please correct me as needed.
> Thanks!
> 

This is an interesting idea, I think we could push decoded packets into the
min heap as the aux records are received, and not do anything with them until
the end of the data is reached. That way instead of saving aux records, we'd
save the result of the decode for each aux record.

Currently each cs_etm_queue has a cs_etm_traceid_queue/cs_etm_packet_queue for each
stream, but that would have to be changed to have multiple ones because multiple
packets could be decoded to get through the whole aux record.

It would be a similarly sized change, and could also have a bigger impact on
memory. So I'm not sure if it would help to reduce the changes, but it is possible.

James

> Leo
> 
>> Signed-off-by: James Clark <james.clark@arm.com>
>> ---
>>  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++++++---
>>  1 file changed, 29 insertions(+), 3 deletions(-)
>>
>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>> index 8f8b448632fb..88b541b2a804 100644
>> --- a/tools/perf/util/cs-etm.c
>> +++ b/tools/perf/util/cs-etm.c
>> @@ -92,12 +92,16 @@ struct cs_etm_queue {
>>  	/* Conversion between traceID and index in traceid_queues array */
>>  	struct intlist *traceid_queues_list;
>>  	struct cs_etm_traceid_queue **traceid_queues;
>> +	int aux_record_list_len;
>> +	int aux_record_list_idx;
>> +	struct perf_record_aux *aux_record_list;
>>  };
>>  
>>  /* RB tree for quick conversion between traceID and metadata pointers */
>>  static struct intlist *traceid_list;
>>  
>> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
>> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu,
>> +				 struct perf_record_aux *aux_record);
>>  static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
>>  static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
>>  					   pid_t tid);
>> @@ -585,6 +589,7 @@ static void cs_etm__free_queue(void *priv)
>>  
>>  	cs_etm_decoder__free(etmq->decoder);
>>  	cs_etm__free_traceid_queues(etmq);
>> +	free(etmq->aux_record_list);
>>  	free(etmq);
>>  }
>>  
>> @@ -759,6 +764,19 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm)
>>  	return NULL;
>>  }
>>  
>> +static int cs_etm__save_aux_record(struct cs_etm_queue *etmq,
>> +				   struct perf_record_aux *aux_record)
>> +{
>> +	etmq->aux_record_list = reallocarray(etmq->aux_record_list,
>> +					      etmq->aux_record_list_len+1,
>> +					      sizeof(*etmq->aux_record_list));
>> +	if (!etmq->aux_record_list)
>> +		return -ENOMEM;
>> +
>> +	etmq->aux_record_list[etmq->aux_record_list_len++] = *aux_record;
>> +	return 0;
>> +}
>> +
>>  static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
>>  {
>>  	int ret = 0;
>> @@ -865,7 +883,7 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
>>  	return 0;
>>  }
>>  
>> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
>> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct perf_record_aux *aux)
>>  {
>>  	int ret;
>>  	if (etm->queues.new_data) {
>> @@ -875,6 +893,14 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
>>  			return ret;
>>  	}
>>  
>> +	/* In timeless mode, cpu is set to -1, and a single aux buffer is filled */
>> +	if (cpu < 0)
>> +		cpu = 0;
>> +
>> +	ret = cs_etm__save_aux_record(etm->queues.queue_array[cpu].priv, aux);
>> +	if (ret)
>> +		return ret;
>> +
>>  	if (!etm->timeless_decoding)
>>  		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
>>  	else
>> @@ -2357,7 +2383,7 @@ static int cs_etm__process_event(struct perf_session *session,
>>  
>>  	if ((timestamp || etm->timeless_decoding)
>>  			&& event->header.type == PERF_RECORD_AUX) {
>> -		err = cs_etm__update_queues(etm, sample->cpu);
>> +		err = cs_etm__update_queues(etm, sample->cpu, &event->aux);
>>  		if (err)
>>  			return err;
>>  	}
>> -- 
>> 2.28.0
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.
  2021-03-01 15:28     ` James Clark
@ 2021-03-02 11:52       ` Leo Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2021-03-02 11:52 UTC (permalink / raw)
  To: James Clark
  Cc: coresight, al.grant, branislav.rankov, denik, suzuki.poulose,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, John Garry, Will Deacon, Mathieu Poirier,
	linux-arm-kernel, linux-kernel

On Mon, Mar 01, 2021 at 05:28:57PM +0200, James Clark wrote:
> 
> 
> On 20/02/2021 13:50, Leo Yan wrote:
> > On Fri, Feb 12, 2021 at 04:45:08PM +0200, James Clark wrote:
> >> Change initial timestamp search to only operate on the queue
> >> related to the current event. In a later change the bounds
> >> of the aux record will also be used to reset the decoder and
> >> the record is only relevant to a single queue.
> > 
> > I roughly understand this patch tries to establish the mechanism for
> > timstamp search per CPU, but I am struggling to understand what's issue
> > you try to address.
> > 
> > So could you give more description for "what's current issue?" and
> > "why need to use this way to fix the issue?".  This would be very
> > appreciated.
> 
> Hi Leo,
> 
> The issue is that the aux records used to reset the decoder are associated
> with a specific CPU/aux queue. Currently when any new data is received, all
> queues are searched for a timestamp. We can't do it that way any more because
> the aux records aren't available yet.
> 
> The reason to fix it this way is because now we can only do decode when
> an aux record is received. This will happen multiple times, and will also
> be cpu/queue specific.

Okay, I think you are try to establish the logic as below:

- Step 1: Wait for PERF_RECORD_AUX event;
- Step 2: Receive PERF_RECORD_AUX event, knows the aux buffer
  offset and size;
- Step 3: Update the auxtrace queue based on the info from
  PERF_RECORD_AUX event;
- Step 4: Find the first trace for timestamp, drop any trace data
  prior to the timestamp;
- Step 5: Conintue to decode the trace data which is contained in the
  current PERF_RECORD_AUX event;
- Step 6: Finish the decoding and goto step1.

> > 
> >> This change makes some files that had coresight data
> >> but didn't syntesise any events start working and generating
> >> events. I'm not sure of the reason for that. I'd expect this
> >> change to only affect the ordering of events.
> > 
> > This seems to me that this patch introduces regression.
> 
> I'm wondering if it is a regression, or accidentally fixing a bug.
> It doesn't seem like it's possible to go from not generating any samples to
> generating lots without accidentally fixing an existing issue. If there is
> valid data there, what would be stopping it from generating any samples?

I think we need to clarify what's the current code logic:

- Step 1: at beginning, only prepare for auxtrace queues, for N:1
  model (all ETMs use the same sink), there have only one queue; for
  1:1 model (one ETM has its own sink), there have multiple queues.

  The function cs_etm__update_queues() is for this step, it doesn't
  generate any sample and only prepare for the first timestamp so
  later the sampels can be compared with each other for the trace data
  coming from multiple CPUs.

- Step 2: it starts to decode the trace data and synthesize samples,
  this is finished by function cs_etm__process_queues().

Seems to me, now we need to fix the issue in step2, in step2, it
decodes the whole AUX buffer, so the main target is to limit the
decoding length, which should base on the info provided by
PERF_RECORD_AUX event.

IIUC, now you are trying to fix issue in step 1, but step 1 is not the
root cause for the issue.

> I do need to look into this more closely though to find the real reason for
> it, which will probably shed more light on it.
> 
> > 
> >> Signed-off-by: James Clark <james.clark@arm.com>
> >> ---
> >>  tools/perf/util/cs-etm.c | 30 ++++++++++++++----------------
> >>  1 file changed, 14 insertions(+), 16 deletions(-)
> >>
> >> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> >> index 27894facae5e..8f8b448632fb 100644
> >> --- a/tools/perf/util/cs-etm.c
> >> +++ b/tools/perf/util/cs-etm.c
> >> @@ -97,7 +97,7 @@ struct cs_etm_queue {
> >>  /* RB tree for quick conversion between traceID and metadata pointers */
> >>  static struct intlist *traceid_list;
> >>  
> >> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
> >> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
> >>  static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
> >>  static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
> >>  					   pid_t tid);
> >> @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm,
> >>  static int cs_etm__flush_events(struct perf_session *session,
> >>  				struct perf_tool *tool)
> >>  {
> >> -	int ret;
> >>  	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
> >>  						   struct cs_etm_auxtrace,
> >>  						   auxtrace);
> >> @@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session,
> >>  	if (!tool->ordered_events)
> >>  		return -EINVAL;
> >>  
> >> -	ret = cs_etm__update_queues(etm);
> >> -
> >> -	if (ret < 0)
> >> -		return ret;
> >> -
> > 
> > When flush events, it means the trace data is discontinuous or at the
> > end of trace data.  If the trace data is discontinuous, we need to use
> > cs_etm__update_queues() to create new queues.  So if we remove the
> > calling cs_etm__update_queues(), I suspect it cannot handle the
> > discontinuous trace data anymore.
> 
> Do you know how to force perf to record data like this? From my experience
> etm->queues.new_data is only set once when the file is first opened.

Maybe can try the snapshot mode, which you can capture multiple times
for the snapshot trace data based on the signals.

Please check the test case [1] for how to use snapshot mode.

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/tests/shell/test_arm_coresight.sh#n157

> > 
> >>  	if (etm->timeless_decoding)
> >>  		return cs_etm__process_timeless_queues(etm, -1);
> >>  
> >> @@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
> >>  	etmq->queue_nr = queue_nr;
> >>  	etmq->offset = 0;
> >>  
> >> -	if (etm->timeless_decoding)
> >> -		return 0;
> >> -	else
> >> -		return cs_etm__search_first_timestamp(etmq);
> >> +	return 0;
> >>  }
> >>  
> >>  static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
> >> @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
> >>  	return 0;
> >>  }
> >>  
> >> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
> >> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
> >>  {
> >> +	int ret;
> >>  	if (etm->queues.new_data) {
> >>  		etm->queues.new_data = false;
> >> -		return cs_etm__setup_queues(etm);
> >> +		ret = cs_etm__setup_queues(etm);
> > 
> > Just remind, the new parameter "cpu" is introduced in this function,
> > in theory, cs_etm__update_queues() should work for the specified CPU.
> > But it always setup queues for all CPUs with calling
> > cs_etm__setup_queues().
> > 
> >> +		if (ret)
> >> +			return ret;
> >>  	}
> >>  
> >> -	return 0;
> >> +	if (!etm->timeless_decoding)
> >> +		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
> >> +	else
> >> +		return 0;
> > 
> > In the original code of cs_etm__update_queues(), if there have no any
> > new data (or has already setup queues), then it does nothing and
> > directly bails out.
> > 
> > After applied the up change, it will always search the first timestamp
> > for the "cpu".
> > 
> 
> Maybe I need to pull my later change into here that makes it skip the search after
> the timestamp is found for the first time. What do you think?

The patches ordering is important for understanding the changes, but let's
firstly sync for the brief direction for the fixing.

> >>  }
> >>  
> >>  static inline
> >> @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session,
> >>  	else
> >>  		timestamp = 0;
> >>  
> >> -	if (timestamp || etm->timeless_decoding) {
> >> -		err = cs_etm__update_queues(etm);
> >> +	if ((timestamp || etm->timeless_decoding)
> >> +			&& event->header.type == PERF_RECORD_AUX) {
> >> +		err = cs_etm__update_queues(etm, sample->cpu);
> > 
> > Here might cause potential issue.  Let's see an example:
> > 
> > If CoreSight uses N:1 model for tracers and sink (just like multiple
> > ETMs output trace to the same ETR), when "sample->cpu" is 0, it will
> > only initialize the timestamp for CPU 0.  In the same AUX trace data,
> > it can contains trace for CPU 1, 2, etc; for this case, the flow
> > doesn't prepare the timestamp for CPU1/2, so it will fail to generate
> > samples for CPU 1/2, right?
> 
> If there is only one sink then wouldn't there also only be only one aux queue
> and aux records generated for one CPU? I thought each channel in the data
> corresponded to a cs_etm_traceid_queue and that is how multiple tracers worked?

If has only one sink, multiple CPUs trace data will finally be
recorded into the single AUX buffer with only one queue.

> But the channel corresponds to a thread rather than a CPU because the struct
> looks like this:
> 
>     struct cs_etm_traceid_queue {
> 	    u8 trace_chan_id;
> 	    pid_t pid, tid;
>             ...

Please note, IIUC, struct cs_etm_traceid_queue is used to track specific info
for every CPU, when the decoder decodes the trace data, it can
generate the "trace_chan_id" rather than the CPU ID, it needs to find the
corresponding aux trace queue based on "trace_chan_id", so this is why it
records "trace_chan_id" in the structure cs_etm_traceid_queue.

@Mathiue, is best knowing for this, please corrent me if I make
any mistake at here.

> There is currently an issue with per-thread mode where AUX events don't have the
> CPU ID set and I've assumed CPU 0 which is incorrect. Maybe that is related to
> your potential issue as well.

please refer the function cs_etm__get_cpu(), it converts "trace_chan_id" to
the CPU number.

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue
  2021-03-01 15:43     ` James Clark
@ 2021-03-02 12:03       ` Leo Yan
  0 siblings, 0 replies; 19+ messages in thread
From: Leo Yan @ 2021-03-02 12:03 UTC (permalink / raw)
  To: James Clark
  Cc: coresight, al.grant, branislav.rankov, denik, suzuki.poulose,
	Mike Leach, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, John Garry, Will Deacon, Mathieu Poirier,
	linux-arm-kernel, linux-kernel

On Mon, Mar 01, 2021 at 05:43:43PM +0200, James Clark wrote:

[...]

> > I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it
> > stands for the record length based on the RECORD_AUX event.  In
> > theory, this value should be always less than "cs_etm_queue::buf_len".
> > 
> > When every time the "PERF_RECORD_AUX" event is coming, we find out the
> > corresponding queue (so this can be applied for "1:1" or "N:1" models
> > for source and sink), and accumulate "perf_record_aux::aux_size" into
> > "cs_etm_queue::buf_rec_len".
> > 
> > At the decoder side, it decreases "etmq->buf_rec_len" until to zero for
> > the current round of decoding (see cs_etm__decode_data_block()).  Since
> > all the "PERF_RECORD_AUX" event will be processed before
> > "PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be
> > ignored.
> > 
> > The main reason for this suggestion is it don't need to change the
> > significant logic in current code.  I will try to do experiment for this
> > idea and share back.
> > 
> > James, if you think I miss anything, please correct me as needed.
> > Thanks!
> > 
> 
> This is an interesting idea, I think we could push decoded packets into the
> min heap as the aux records are received, and not do anything with them until
> the end of the data is reached. That way instead of saving aux records, we'd
> save the result of the decode for each aux record.
> 
> Currently each cs_etm_queue has a cs_etm_traceid_queue/cs_etm_packet_queue for each
> stream, but that would have to be changed to have multiple ones because multiple
> packets could be decoded to get through the whole aux record.
> 
> It would be a similarly sized change, and could also have a bigger impact on
> memory. So I'm not sure if it would help to reduce the changes, but it is possible.

Below change is still very coarse and I just did very basic testing for
it, so didn't cover all cases; so simply use it to demonstrate the basic
idea.

Before the event PERF_RECORD_AUX arrives, we don't decode any trace
data.  And after PERF_RECORD_AUX coming, the aux buffer size will be
accumulated into the queue, and decode the trace data for the queue
based on the accumulated buffer length.

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index b9c1d329a7f1..3bd5609b6de4 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -89,7 +89,7 @@ struct cs_etm_queue {
 	u8 pending_timestamp;
 	u64 offset;
 	const unsigned char *buf;
-	size_t buf_len, buf_used;
+	size_t aux_buf_len, buf_len, buf_used;
 	/* Conversion between traceID and index in traceid_queues array */
 	struct intlist *traceid_queues_list;
 	struct cs_etm_traceid_queue **traceid_queues;
@@ -1085,6 +1085,7 @@ cs_etm__get_trace(struct cs_etm_queue *etmq)
 		if (old_buffer)
 			auxtrace_buffer__drop_data(old_buffer);
 		etmq->buf_len = 0;
+		etmq->aux_buf_len = 0;
 		return 0;
 	}
 
@@ -2052,6 +2053,7 @@ static int cs_etm__decode_data_block(struct cs_etm_queue *etmq)
 	etmq->offset += processed;
 	etmq->buf_used += processed;
 	etmq->buf_len -= processed;
+	etmq->aux_buf_len -= processed;
 
 out:
 	return ret;
@@ -2177,7 +2179,7 @@ static int cs_etm__run_decoder(struct cs_etm_queue *etmq)
 			 */
 			err = cs_etm__process_traceid_queue(etmq, tidq);
 
-		} while (etmq->buf_len);
+		} while (etmq->aux_buf_len > 0);
 
 		if (err == 0)
 			/* Flush any remaining branch stack entries */
@@ -2216,6 +2218,27 @@ static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
 	return 0;
 }
 
+static void cs_etm__update_aux_buf_len(struct cs_etm_auxtrace *etm,
+				      struct perf_record_aux *aux)
+{
+	unsigned int cs_queue_nr, queue_nr;
+	struct auxtrace_queue *queue;
+	struct cs_etm_queue *etmq;
+
+	if (!etm->heap.heap_cnt)
+		return;
+
+	/* Take the entry at the top of the min heap */
+	cs_queue_nr = etm->heap.heap_array[0].queue_nr;
+	queue_nr = TO_QUEUE_NR(cs_queue_nr);
+	queue = &etm->queues.queue_array[queue_nr];
+	etmq = queue->priv;
+
+	etmq->aux_buf_len += aux->aux_size;
+	fprintf(stderr, "%s: aux_buf_len=%ld\n", __func__, etmq->aux_buf_len);
+	return;
+}
+
 static int cs_etm__process_queues(struct cs_etm_auxtrace *etm)
 {
 	int ret = 0;
@@ -2272,6 +2295,9 @@ static int cs_etm__process_queues(struct cs_etm_auxtrace *etm)
 		if (ret < 0)
 			goto out;
 
+		if (etmq->aux_buf_len <= 0)
+			goto out;
+
 		/*
 		 * No more auxtrace_buffers to process in this etmq, simply
 		 * move on to another entry in the auxtrace_heap.
@@ -2414,9 +2440,15 @@ static int cs_etm__process_event(struct perf_session *session,
 	else if (event->header.type == PERF_RECORD_SWITCH_CPU_WIDE)
 		return cs_etm__process_switch_cpu_wide(etm, event);
 
+	fprintf(stderr, "%s: event->header.type=%d\n", __func__, event->header.type);
+
 	if (!etm->timeless_decoding &&
-	    event->header.type == PERF_RECORD_AUX)
+	    event->header.type == PERF_RECORD_AUX) {
+
+		fprintf(stderr, "%s: aux_size=%lld\n", __func__, event->aux.aux_size);
+		cs_etm__update_aux_buf_len(etm, &event->aux);
 		return cs_etm__process_queues(etm);
+	}
 
 	return 0;
 }

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 19+ messages in thread

* Re: [PATCH 0/7] Split Coresight decode by aux records
  2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
                   ` (7 preceding siblings ...)
  2021-02-24 16:13 ` [PATCH 0/7] Split Coresight decode by aux records Mathieu Poirier
@ 2021-04-15 20:37 ` Mathieu Poirier
  8 siblings, 0 replies; 19+ messages in thread
From: Mathieu Poirier @ 2021-04-15 20:37 UTC (permalink / raw)
  To: James Clark
  Cc: coresight, al.grant, branislav.rankov, denik, suzuki.poulose,
	Mike Leach, Leo Yan, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, John Garry, Will Deacon, linux-arm-kernel,
	linux-kernel

Hi James,

On Fri, Feb 12, 2021 at 04:45:06PM +0200, James Clark wrote:
> Hi All,
> 
> Since my previous RFC, I've fixed --per-thread mode and solved
> most of the open questions. I've also changed --dump-raw-trace
> to use the same code path so it's also working now.
> 
> I think the only open questions are:
>   * General approach
>   * If aux records need to be saved, or if they can be pulled
>     from elsewhere.
> 
> I've also tested perf inject which is now working with troublesome
> files.
>

This patchset is now at the top of the coresight review list.  I suggest you
rebase this on top of your two timestamp related patches and send another
revision.
 
> Thanks
> James
> 
> James Clark (7):
>   perf cs-etm: Split up etm queue setup function
>   perf cs-etm: Only search timestamp in current sample's queue.
>   perf cs-etm: Save aux records in each etm queue
>   perf cs-etm: don't process queues until cs_etm__flush_events
>   perf cs-etm: split decode by aux records.
>   perf cs-etm: Use existing decode code path for --dump-raw-trace
>   perf cs-etm: Suppress printing when resetting decoder
> 
>  .../perf/util/cs-etm-decoder/cs-etm-decoder.c |  10 +-
>  tools/perf/util/cs-etm.c                      | 300 ++++++++++--------
>  2 files changed, 168 insertions(+), 142 deletions(-)
> 
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue.
  2021-02-12 14:45 ` [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue James Clark
  2021-02-20 11:50   ` Leo Yan
@ 2021-05-06 10:45   ` James Clark
  1 sibling, 0 replies; 19+ messages in thread
From: James Clark @ 2021-05-06 10:45 UTC (permalink / raw)
  To: coresight
  Cc: al.grant, branislav.rankov, denik, suzuki.poulose, Mike Leach,
	Leo Yan, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, John Garry, Will Deacon, Mathieu Poirier,
	linux-arm-kernel, linux-kernel



On 12/02/2021 16:45, James Clark wrote:
> Change initial timestamp search to only operate on the queue
> related to the current event. In a later change the bounds
> of the aux record will also be used to reset the decoder and
> the record is only relevant to a single queue.
> 
> This change makes some files that had coresight data
> but didn't syntesise any events start working and generating
> events. I'm not sure of the reason for that. I'd expect this
> change to only affect the ordering of events.

I've gotten to the bottom of this mystery of why decoding starts working because of this change.
Currently:

 * _All_ decoding happens on the first AUX record
 * Decoding depends on binary data (so also depends on MMAP records)
 * Ordering of AUX records and MMAP records is timing sensitive

So there are two scenarios:
 1) The perf.data file contains MMAPs followed by AUX records. Everything works
 2) The perf.data file contains an AUX record, followed by MMAPS, then further AUX records. Decoding never worked.

Per-thread mode (timeless) always worked because we wait for EXIT rather than AUX to start the decode, which is after MMAPS.
Per-cpu mode was always at the mercy of the ordering of events. So it's not a regression that this patchset changes the behaviour
here and it's doing more of 'the right thing' now.

As a separate change I will add a warning to cs_etm__mem_access() when it fails to find the right binary as this is a current sore point.
 
James

> 
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 30 ++++++++++++++----------------
>  1 file changed, 14 insertions(+), 16 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 27894facae5e..8f8b448632fb 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -97,7 +97,7 @@ struct cs_etm_queue {
>  /* RB tree for quick conversion between traceID and metadata pointers */
>  static struct intlist *traceid_list;
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm);
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
>  static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
>  static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
>  					   pid_t tid);
> @@ -524,7 +524,6 @@ static void cs_etm__dump_event(struct cs_etm_auxtrace *etm,
>  static int cs_etm__flush_events(struct perf_session *session,
>  				struct perf_tool *tool)
>  {
> -	int ret;
>  	struct cs_etm_auxtrace *etm = container_of(session->auxtrace,
>  						   struct cs_etm_auxtrace,
>  						   auxtrace);
> @@ -534,11 +533,6 @@ static int cs_etm__flush_events(struct perf_session *session,
>  	if (!tool->ordered_events)
>  		return -EINVAL;
>  
> -	ret = cs_etm__update_queues(etm);
> -
> -	if (ret < 0)
> -		return ret;
> -
>  	if (etm->timeless_decoding)
>  		return cs_etm__process_timeless_queues(etm, -1);
>  
> @@ -851,10 +845,7 @@ static int cs_etm__setup_queue(struct cs_etm_auxtrace *etm,
>  	etmq->queue_nr = queue_nr;
>  	etmq->offset = 0;
>  
> -	if (etm->timeless_decoding)
> -		return 0;
> -	else
> -		return cs_etm__search_first_timestamp(etmq);
> +	return 0;
>  }
>  
>  static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
> @@ -874,14 +865,20 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
>  	return 0;
>  }
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm)
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
>  {
> +	int ret;
>  	if (etm->queues.new_data) {
>  		etm->queues.new_data = false;
> -		return cs_etm__setup_queues(etm);
> +		ret = cs_etm__setup_queues(etm);
> +		if (ret)
> +			return ret;
>  	}
>  
> -	return 0;
> +	if (!etm->timeless_decoding)
> +		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
> +	else
> +		return 0;
>  }
>  
>  static inline
> @@ -2358,8 +2355,9 @@ static int cs_etm__process_event(struct perf_session *session,
>  	else
>  		timestamp = 0;
>  
> -	if (timestamp || etm->timeless_decoding) {
> -		err = cs_etm__update_queues(etm);
> +	if ((timestamp || etm->timeless_decoding)
> +			&& event->header.type == PERF_RECORD_AUX) {
> +		err = cs_etm__update_queues(etm, sample->cpu);
>  		if (err)
>  			return err;
>  	}
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2021-05-06 10:48 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
2021-02-12 14:45 ` [PATCH 1/7] perf cs-etm: Split up etm queue setup function James Clark
2021-02-20  8:11   ` Leo Yan
2021-02-12 14:45 ` [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue James Clark
2021-02-20 11:50   ` Leo Yan
2021-03-01 15:28     ` James Clark
2021-03-02 11:52       ` Leo Yan
2021-05-06 10:45   ` James Clark
2021-02-12 14:45 ` [PATCH 3/7] perf cs-etm: Save aux records in each etm queue James Clark
2021-02-27  7:10   ` Leo Yan
2021-03-01 15:43     ` James Clark
2021-03-02 12:03       ` Leo Yan
2021-02-12 14:45 ` [PATCH 4/7] perf cs-etm: don't process queues until cs_etm__flush_events James Clark
2021-02-12 14:45 ` [PATCH 5/7] perf cs-etm: split decode by aux records James Clark
2021-02-12 14:45 ` [PATCH 6/7] perf cs-etm: Use existing decode code path for --dump-raw-trace James Clark
2021-02-12 14:45 ` [PATCH 7/7] perf cs-etm: Suppress printing when resetting decoder James Clark
2021-02-24 16:13 ` [PATCH 0/7] Split Coresight decode by aux records Mathieu Poirier
2021-03-01 14:05   ` James Clark
2021-04-15 20:37 ` Mathieu Poirier

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).