linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v3 0/5] perf cs-etm: Fix synthesizing instruction samples
@ 2020-02-03  1:51 Leo Yan
  2020-02-03  1:51 ` [PATCH v3 1/5] perf cs-etm: Swap packets for " Leo Yan
                   ` (4 more replies)
  0 siblings, 5 replies; 13+ messages in thread
From: Leo Yan @ 2020-02-03  1:51 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

Let's restart this work [1], this patch set is the dependency for
support callchain for Arm CoreSight, which will be sent out in another
patch set.

This patch series is to address issues for synthesizing instruction
samples, especially when the instruction sample period is small enough,
the current logic cannot synthesize multiple instruction samples within
one instruction range packet.

Patch 0001 is to swap packets for instruction samples, so this allow
option '--itrace=iNNN' can work well.

Patch 0002 avoids to reset the last branches for every instruction
sample; if reset the last branches for every time generating sample, the
later samples in the same range packet cannot use the last branches
anymore.

Patch 0003 is the fixing for handling different instruction periods,
especially for small sample period.

Patch 0004 is an optimization for copying last branches; it only copies
last branches once if the instruction samples share the same last
branches.

Patch 0005 is a minor fix for unsigned variable comparison to zero.

This patch set has been rebased on the latest perf/core branch; and
verified on Juno board with below commands:

  # perf script --itrace=i2
  # perf script --itrace=i2il16
  # perf inject --itrace=i2il16 -i perf.data -o perf.data.new
  # perf inject --itrace=i100il16 -i perf.data -o perf.data.new

Changes from v2:
* Added patch 0001 which is to fix swapping packets for instruction
  samples;
* Refined minor commit logs and comments;
* Rebased on the latest perf/core branch.

Changes from v1:
* Rebased patch set on perf/core branch with latest commit 9fec3cd5fa4a
  ("perf map: Check if the map still has some refcounts on exit").

[1] https://patchwork.kernel.org/cover/11222259/


Leo Yan (5):
  perf cs-etm: Swap packets for instruction samples
  perf cs-etm: Continuously record last branch
  perf cs-etm: Correct synthesizing instruction samples
  perf cs-etm: Optimize copying last branches
  perf cs-etm: Fix unsigned variable comparison to zero

 tools/perf/util/cs-etm.c | 142 ++++++++++++++++++++++++++++++++-------
 1 file changed, 118 insertions(+), 24 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v3 1/5] perf cs-etm: Swap packets for instruction samples
  2020-02-03  1:51 [PATCH v3 0/5] perf cs-etm: Fix synthesizing instruction samples Leo Yan
@ 2020-02-03  1:51 ` Leo Yan
  2020-02-05 15:59   ` Mike Leach
  2020-02-03  1:52 ` [PATCH v3 2/5] perf cs-etm: Continuously record last branch Leo Yan
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Leo Yan @ 2020-02-03  1:51 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool
fails inject instruction samples; the root cause is the packets are
only switched for branch samples and last branches but not for
instruction samples, so the new coming packets cannot be properly
handled for only synthesizing instruction samples.

To fix this issue, this patch switches packets for instruction samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 5471045ebf5c..3dd5ba34a2c2 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1404,7 +1404,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		}
 	}
 
-	if (etm->sample_branches || etm->synth_opts.last_branch) {
+	if (etm->sample_branches || etm->synth_opts.last_branch ||
+	    etm->sample_instructions) {
 		/*
 		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
 		 * the next incoming packet.
@@ -1476,7 +1477,8 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 	}
 
 swap_packet:
-	if (etm->sample_branches || etm->synth_opts.last_branch) {
+	if (etm->sample_branches || etm->synth_opts.last_branch ||
+	    etm->sample_instructions) {
 		/*
 		 * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
 		 * the next incoming packet.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 2/5] perf cs-etm: Continuously record last branch
  2020-02-03  1:51 [PATCH v3 0/5] perf cs-etm: Fix synthesizing instruction samples Leo Yan
  2020-02-03  1:51 ` [PATCH v3 1/5] perf cs-etm: Swap packets for " Leo Yan
@ 2020-02-03  1:52 ` Leo Yan
  2020-02-05 16:01   ` Mike Leach
  2020-02-03  1:52 ` [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples Leo Yan
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 13+ messages in thread
From: Leo Yan @ 2020-02-03  1:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

Every time synthesize instruction sample, the last branch recording
will be reset.  This is fine if the instruction period is big enough,
for example if use the option '--itrace=i100000', the last branch
array is reset for every sample with 100000 instructions per period;
before generate the next instruction sample, there has the sufficient
packets coming to fill the last branch array.

On the other hand, if set a very small period, the packets will be
significantly reduced between two continuous instruction samples, thus
the last branch array is almost empty for new instruction sample by
frequently resetting.

To allow the last branches to work properly for any instruction periods,
this patch avoids to reset the last branch for every instruction sample
and only reset it when flush the trace data.  The last branches will
be reset only for two cases, one is for trace starting, another case
is for discontinuous trace; other cases can keep recording last branches
for continuous instruction samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 7 ++++---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 3dd5ba34a2c2..3e28462609e7 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1153,9 +1153,6 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 			"CS ETM Trace: failed to deliver instruction event, error %d\n",
 			ret);
 
-	if (etm->synth_opts.last_branch)
-		cs_etm__reset_last_branch_rb(tidq);
-
 	return ret;
 }
 
@@ -1488,6 +1485,10 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 		tidq->prev_packet = tmp;
 	}
 
+	/* Reset last branches after flush the trace */
+	if (etm->synth_opts.last_branch)
+		cs_etm__reset_last_branch_rb(tidq);
+
 	return err;
 }
 
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples
  2020-02-03  1:51 [PATCH v3 0/5] perf cs-etm: Fix synthesizing instruction samples Leo Yan
  2020-02-03  1:51 ` [PATCH v3 1/5] perf cs-etm: Swap packets for " Leo Yan
  2020-02-03  1:52 ` [PATCH v3 2/5] perf cs-etm: Continuously record last branch Leo Yan
@ 2020-02-03  1:52 ` Leo Yan
  2020-02-05 16:09   ` Mike Leach
  2020-02-03  1:52 ` [PATCH v3 4/5] perf cs-etm: Optimize copying last branches Leo Yan
  2020-02-03  1:52 ` [PATCH v3 5/5] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
  4 siblings, 1 reply; 13+ messages in thread
From: Leo Yan @ 2020-02-03  1:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

When 'etm->instructions_sample_period' is less than
'tidq->period_instructions', the function cs_etm__sample() cannot handle
this case properly with its logic.

Let's see below flow as an example:

- If we set itrace option '--itrace=i4', then function cs_etm__sample()
  has variables with initialized values:

  tidq->period_instructions = 0
  etm->instructions_sample_period = 4

- When the first packet is coming:

  packet->instr_count = 10; the number of instructions executed in this
  packet is 10, thus update period_instructions as below:

  tidq->period_instructions = 0 + 10 = 10
  instrs_over = 10 - 4 = 6
  offset = 10 - 6 - 1 = 3
  tidq->period_instructions = instrs_over = 6

- When the second packet is coming:

  packet->instr_count = 10; in the second pass, assume 10 instructions
  in the trace sample again:

  tidq->period_instructions = 6 + 10 = 16
  instrs_over = 16 - 4 = 12
  offset = 10 - 12 - 1 = -3  -> the negative value
  tidq->period_instructions = instrs_over = 12

So after handle these two packets, there have below issues:

The first issue is that cs_etm__instr_addr() returns the address within
the current trace sample of the instruction related to offset, so the
offset is supposed to be always unsigned value.  But in fact, function
cs_etm__sample() might calculate a negative offset value (in handling
the second packet, the offset is -3) and pass to cs_etm__instr_addr()
with u64 type with a big positive integer.

The second issue is it only synthesizes 2 samples for sample period = 4.
In theory, every packet has 10 instructions so the two packets have
total 20 instructions, 20 instructions should generate 5 samples
(4 x 5 = 20).  This is because cs_etm__sample() only calls once
cs_etm__synth_instruction_sample() to generate instruction sample per
range packet.

This patch fixes the logic in function cs_etm__sample(); the basic
idea is to divide into three parts for handling coming packet:

- The first part is for synthesizing the first instruction sample, it
  combines the instructions from the tail of previous packet and the
  instructions from the head of the new packet;
- The second part is to simply generate samples with sample period
  aligned;
- The third part is the tail of new packet, the rest instructions will
  be left for the sequential sample handling.

Suggested-by: Mike Leach <mike.leach@linaro.org>
Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++-----
 1 file changed, 92 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 3e28462609e7..c5a05f728eac 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		 * TODO: allow period to be defined in cycles and clock time
 		 */
 
-		/* Get number of instructions executed after the sample point */
-		u64 instrs_over = tidq->period_instructions -
-			etm->instructions_sample_period;
+		/*
+		 * Below diagram demonstrates the instruction samples
+		 * generation flows:
+		 *
+		 *    Instrs     Instrs       Instrs       Instrs
+		 *   Sample(n)  Sample(n+1)  Sample(n+2)  Sample(n+3)
+		 *    |            |            |            |
+		 *    V            V            V            V
+		 *   --------------------------------------------------
+		 *            ^                                  ^
+		 *            |                                  |
+		 *         Period                             Period
+		 *    instructions(Pi)                   instructions(Pi')
+		 *
+		 *            |                                  |
+		 *            \---------------- -----------------/
+		 *                             V
+		 *                      instrs_executed
+		 *
+		 * Period instructions (Pi) contains the the number of
+		 * instructions executed after the sample point(n).  When a new
+		 * instruction packet is coming and generate for the next sample
+		 * (n+1), it combines with two parts instructions, one is the
+		 * tail of the old packet and another is the head of the new
+		 * coming packet.  So 'head' variable is used to cauclate the
+		 * instruction numbers in the new packet for sample(n+1).
+		 *
+		 * Sample(n+2) and sample(n+3) consume the instructions with
+		 * sample period, so directly generate samples based on the
+		 * sampe period.
+		 *
+		 * After sample(n+3), the rest instructions will be used by
+		 * later packet; so use 'instrs_over' to track the rest
+		 * instruction number and it is assigned to
+		 * 'tidq->period_instructions' for next round calculation.
+		 */
+		u64 head, offset = 0;
+		u64 addr;
 
 		/*
-		 * Calculate the address of the sampled instruction (-1 as
-		 * sample is reported as though instruction has just been
-		 * executed, but PC has not advanced to next instruction)
+		 * 'instrs_over' is the number of instructions executed after
+		 * sample points, initialise it to 'instrs_executed' and will
+		 * decrease it for consumed instructions in every synthesized
+		 * instruction sample.
 		 */
-		u64 offset = (instrs_executed - instrs_over - 1);
-		u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
-					      tidq->packet, offset);
+		u64 instrs_over = instrs_executed;
 
-		ret = cs_etm__synth_instruction_sample(
-			etmq, tidq, addr, etm->instructions_sample_period);
-		if (ret)
-			return ret;
+		/*
+		 * 'head' is the instructions number of the head in the new
+		 * packet, it combines with the tail of previous packet to
+		 * generate a sample.  So 'head' uses the sample period to
+		 * decrease the instruction number introduced by the previous
+		 * packet.
+		 */
+		head = etm->instructions_sample_period -
+				  (tidq->period_instructions - instrs_executed);
+
+		if (head) {
+			offset = head;
+
+			/*
+			 * Calculate the address of the sampled instruction (-1
+			 * as sample is reported as though instruction has just
+			 * been executed, but PC has not advanced to next
+			 * instruction)
+			 */
+			addr = cs_etm__instr_addr(etmq, trace_chan_id,
+						  tidq->packet, offset - 1);
+			ret = cs_etm__synth_instruction_sample(
+				etmq, tidq, addr,
+				etm->instructions_sample_period);
+			if (ret)
+				return ret;
+
+			instrs_over -= head;
+		}
+
+		while (instrs_over >= etm->instructions_sample_period) {
+			offset += etm->instructions_sample_period;
+
+			/*
+			 * Calculate the address of the sampled instruction (-1
+			 * as sample is reported as though instruction has just
+			 * been executed, but PC has not advanced to next
+			 * instruction)
+			 */
+			addr = cs_etm__instr_addr(etmq, trace_chan_id,
+						  tidq->packet, offset - 1);
+			ret = cs_etm__synth_instruction_sample(
+				etmq, tidq, addr,
+				etm->instructions_sample_period);
+			if (ret)
+				return ret;
+
+			instrs_over -= etm->instructions_sample_period;
+		}
 
 		/* Carry remaining instructions into next sample period */
 		tidq->period_instructions = instrs_over;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 4/5] perf cs-etm: Optimize copying last branches
  2020-02-03  1:51 [PATCH v3 0/5] perf cs-etm: Fix synthesizing instruction samples Leo Yan
                   ` (2 preceding siblings ...)
  2020-02-03  1:52 ` [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples Leo Yan
@ 2020-02-03  1:52 ` Leo Yan
  2020-02-06 11:47   ` Mike Leach
  2020-02-03  1:52 ` [PATCH v3 5/5] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
  4 siblings, 1 reply; 13+ messages in thread
From: Leo Yan @ 2020-02-03  1:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

If an instruction range packet can generate multiple instruction
samples, these samples share the same last branches; it's not necessary
to copy the same last branches repeatedly for these samples within the
same packet.

This patch moves out the last branches copying from function
cs_etm__synth_instruction_sample(), and execute it prior to generating
instruction samples.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index c5a05f728eac..dbddf1eec2be 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1134,10 +1134,8 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 
 	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
 
-	if (etm->synth_opts.last_branch) {
-		cs_etm__copy_last_branch_rb(etmq, tidq);
+	if (etm->synth_opts.last_branch)
 		sample.branch_stack = tidq->last_branch;
-	}
 
 	if (etm->synth_opts.inject) {
 		ret = cs_etm__inject_event(event, &sample,
@@ -1407,6 +1405,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 		 */
 		u64 instrs_over = instrs_executed;
 
+		/* Prepare last branches for instruction sample */
+		if (etm->synth_opts.last_branch)
+			cs_etm__copy_last_branch_rb(etmq, tidq);
+
 		/*
 		 * 'head' is the instructions number of the head in the new
 		 * packet, it combines with the tail of previous packet to
@@ -1526,6 +1528,11 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 
 	if (etmq->etm->synth_opts.last_branch &&
 	    tidq->prev_packet->sample_type == CS_ETM_RANGE) {
+		u64 addr;
+
+		/* Prepare last branches for instruction sample */
+		cs_etm__copy_last_branch_rb(etmq, tidq);
+
 		/*
 		 * Generate a last branch event for the branches left in the
 		 * circular buffer at the end of the trace.
@@ -1533,7 +1540,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 		 * Use the address of the end of the last reported execution
 		 * range
 		 */
-		u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
+		addr = cs_etm__last_executed_instr(tidq->prev_packet);
 
 		err = cs_etm__synth_instruction_sample(
 			etmq, tidq, addr,
@@ -1587,11 +1594,16 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq,
 	 */
 	if (etmq->etm->synth_opts.last_branch &&
 	    tidq->prev_packet->sample_type == CS_ETM_RANGE) {
+		u64 addr;
+
+		/* Prepare last branches for instruction sample */
+		cs_etm__copy_last_branch_rb(etmq, tidq);
+
 		/*
 		 * Use the address of the end of the last reported execution
 		 * range.
 		 */
-		u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
+		addr = cs_etm__last_executed_instr(tidq->prev_packet);
 
 		err = cs_etm__synth_instruction_sample(
 			etmq, tidq, addr,
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v3 5/5] perf cs-etm: Fix unsigned variable comparison to zero
  2020-02-03  1:51 [PATCH v3 0/5] perf cs-etm: Fix synthesizing instruction samples Leo Yan
                   ` (3 preceding siblings ...)
  2020-02-03  1:52 ` [PATCH v3 4/5] perf cs-etm: Optimize copying last branches Leo Yan
@ 2020-02-03  1:52 ` Leo Yan
  2020-02-06 11:48   ` Mike Leach
  4 siblings, 1 reply; 13+ messages in thread
From: Leo Yan @ 2020-02-03  1:52 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Mike Leach, Robert Walker, Coresight ML
  Cc: Leo Yan

The variable 'offset' in function cs_etm__sample() is u64 type, it's not
appropriate to check it with 'while (offset > 0)'; this patch changes to
'while (offset)'.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/cs-etm.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index dbddf1eec2be..720108bd8dba 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -945,7 +945,7 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
 	if (packet->isa == CS_ETM_ISA_T32) {
 		u64 addr = packet->start_addr;
 
-		while (offset > 0) {
+		while (offset) {
 			addr += cs_etm__t32_instr_size(etmq,
 						       trace_chan_id, addr);
 			offset--;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] perf cs-etm: Swap packets for instruction samples
  2020-02-03  1:51 ` [PATCH v3 1/5] perf cs-etm: Swap packets for " Leo Yan
@ 2020-02-05 15:59   ` Mike Leach
  2020-02-06  7:43     ` Leo Yan
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Leach @ 2020-02-05 15:59 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Hi Leo

On Mon, 3 Feb 2020 at 01:52, Leo Yan <leo.yan@linaro.org> wrote:
>
> If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool
> fails inject instruction samples; the root cause is the packets are
> only switched for branch samples and last branches but not for
> instruction samples, so the new coming packets cannot be properly
> handled for only synthesizing instruction samples.
>
> To fix this issue, this patch switches packets for instruction samples.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 6 ++++--
>  1 file changed, 4 insertions(+), 2 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 5471045ebf5c..3dd5ba34a2c2 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1404,7 +1404,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>                 }
>         }
>
> -       if (etm->sample_branches || etm->synth_opts.last_branch) {
> +       if (etm->sample_branches || etm->synth_opts.last_branch ||
> +           etm->sample_instructions) {
>                 /*
>                  * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>                  * the next incoming packet.
> @@ -1476,7 +1477,8 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>         }
>
>  swap_packet:
> -       if (etm->sample_branches || etm->synth_opts.last_branch) {
> +       if (etm->sample_branches || etm->synth_opts.last_branch ||
> +           etm->sample_instructions) {
>                 /*
>                  * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
>                  * the next incoming packet.
> --
> 2.17.1
>
if is worth putting the 'if <options> { swap packet }' into a separate
function as it appears twice in identical form? Might help if more
options for swap packet are needed later.

Either way

Reviewed by: Mike Leach <mike.leach@linaro.org>


-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 2/5] perf cs-etm: Continuously record last branch
  2020-02-03  1:52 ` [PATCH v3 2/5] perf cs-etm: Continuously record last branch Leo Yan
@ 2020-02-05 16:01   ` Mike Leach
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Leach @ 2020-02-05 16:01 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

On Mon, 3 Feb 2020 at 01:52, Leo Yan <leo.yan@linaro.org> wrote:
>
> Every time synthesize instruction sample, the last branch recording
> will be reset.  This is fine if the instruction period is big enough,
> for example if use the option '--itrace=i100000', the last branch
> array is reset for every sample with 100000 instructions per period;
> before generate the next instruction sample, there has the sufficient
> packets coming to fill the last branch array.
>
> On the other hand, if set a very small period, the packets will be
> significantly reduced between two continuous instruction samples, thus
> the last branch array is almost empty for new instruction sample by
> frequently resetting.
>
> To allow the last branches to work properly for any instruction periods,
> this patch avoids to reset the last branch for every instruction sample
> and only reset it when flush the trace data.  The last branches will
> be reset only for two cases, one is for trace starting, another case
> is for discontinuous trace; other cases can keep recording last branches
> for continuous instruction samples.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 7 ++++---
>  1 file changed, 4 insertions(+), 3 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 3dd5ba34a2c2..3e28462609e7 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1153,9 +1153,6 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>                         "CS ETM Trace: failed to deliver instruction event, error %d\n",
>                         ret);
>
> -       if (etm->synth_opts.last_branch)
> -               cs_etm__reset_last_branch_rb(tidq);
> -
>         return ret;
>  }
>
> @@ -1488,6 +1485,10 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>                 tidq->prev_packet = tmp;
>         }
>
> +       /* Reset last branches after flush the trace */
> +       if (etm->synth_opts.last_branch)
> +               cs_etm__reset_last_branch_rb(tidq);
> +
>         return err;
>  }
>
> --
> 2.17.1
>

Reviewed by: Mike Leach <mike.leach@linaro.org>
-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples
  2020-02-03  1:52 ` [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples Leo Yan
@ 2020-02-05 16:09   ` Mike Leach
  2020-02-06  8:24     ` Leo Yan
  0 siblings, 1 reply; 13+ messages in thread
From: Mike Leach @ 2020-02-05 16:09 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Hi Leo,

There are a couple of typos in the comments below, but I also believe
that the sample loop could be considerably simplified

On Mon, 3 Feb 2020 at 01:52, Leo Yan <leo.yan@linaro.org> wrote:
>
> When 'etm->instructions_sample_period' is less than
> 'tidq->period_instructions', the function cs_etm__sample() cannot handle
> this case properly with its logic.
>
> Let's see below flow as an example:
>
> - If we set itrace option '--itrace=i4', then function cs_etm__sample()
>   has variables with initialized values:
>
>   tidq->period_instructions = 0
>   etm->instructions_sample_period = 4
>
> - When the first packet is coming:
>
>   packet->instr_count = 10; the number of instructions executed in this
>   packet is 10, thus update period_instructions as below:
>
>   tidq->period_instructions = 0 + 10 = 10
>   instrs_over = 10 - 4 = 6
>   offset = 10 - 6 - 1 = 3
>   tidq->period_instructions = instrs_over = 6
>
> - When the second packet is coming:
>
>   packet->instr_count = 10; in the second pass, assume 10 instructions
>   in the trace sample again:
>
>   tidq->period_instructions = 6 + 10 = 16
>   instrs_over = 16 - 4 = 12
>   offset = 10 - 12 - 1 = -3  -> the negative value
>   tidq->period_instructions = instrs_over = 12
>
> So after handle these two packets, there have below issues:
>
> The first issue is that cs_etm__instr_addr() returns the address within
> the current trace sample of the instruction related to offset, so the
> offset is supposed to be always unsigned value.  But in fact, function
> cs_etm__sample() might calculate a negative offset value (in handling
> the second packet, the offset is -3) and pass to cs_etm__instr_addr()
> with u64 type with a big positive integer.
>
> The second issue is it only synthesizes 2 samples for sample period = 4.
> In theory, every packet has 10 instructions so the two packets have
> total 20 instructions, 20 instructions should generate 5 samples
> (4 x 5 = 20).  This is because cs_etm__sample() only calls once
> cs_etm__synth_instruction_sample() to generate instruction sample per
> range packet.
>
> This patch fixes the logic in function cs_etm__sample(); the basic
> idea is to divide into three parts for handling coming packet:
>
> - The first part is for synthesizing the first instruction sample, it
>   combines the instructions from the tail of previous packet and the
>   instructions from the head of the new packet;
> - The second part is to simply generate samples with sample period
>   aligned;
> - The third part is the tail of new packet, the rest instructions will
>   be left for the sequential sample handling.
>
> Suggested-by: Mike Leach <mike.leach@linaro.org>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++-----
>  1 file changed, 92 insertions(+), 13 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 3e28462609e7..c5a05f728eac 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>                  * TODO: allow period to be defined in cycles and clock time
>                  */
>
> -               /* Get number of instructions executed after the sample point */
> -               u64 instrs_over = tidq->period_instructions -
> -                       etm->instructions_sample_period;
> +               /*
> +                * Below diagram demonstrates the instruction samples
> +                * generation flows:
> +                *
> +                *    Instrs     Instrs       Instrs       Instrs
> +                *   Sample(n)  Sample(n+1)  Sample(n+2)  Sample(n+3)
> +                *    |            |            |            |
> +                *    V            V            V            V
> +                *   --------------------------------------------------
> +                *            ^                                  ^
> +                *            |                                  |
> +                *         Period                             Period
> +                *    instructions(Pi)                   instructions(Pi')
> +                *
> +                *            |                                  |
> +                *            \---------------- -----------------/
> +                *                             V
> +                *                      instrs_executed
> +                *
> +                * Period instructions (Pi) contains the the number of
> +                * instructions executed after the sample point(n).  When a new
> +                * instruction packet is coming and generate for the next sample
> +                * (n+1), it combines with two parts instructions, one is the
> +                * tail of the old packet and another is the head of the new
> +                * coming packet.  So 'head' variable is used to cauclate the
typo : s/cauclate/calculate
> +                * instruction numbers in the new packet for sample(n+1).
> +                *
> +                * Sample(n+2) and sample(n+3) consume the instructions with
> +                * sample period, so directly generate samples based on the
> +                * sampe period.
> +                *
typo: s/sampe/sample
> +                * After sample(n+3), the rest instructions will be used by
> +                * later packet; so use 'instrs_over' to track the rest
> +                * instruction number and it is assigned to
> +                * 'tidq->period_instructions' for next round calculation.
> +                */
> +               u64 head, offset = 0;
> +               u64 addr;
>
>                 /*
> -                * Calculate the address of the sampled instruction (-1 as
> -                * sample is reported as though instruction has just been
> -                * executed, but PC has not advanced to next instruction)
> +                * 'instrs_over' is the number of instructions executed after
> +                * sample points, initialise it to 'instrs_executed' and will
> +                * decrease it for consumed instructions in every synthesized
> +                * instruction sample.
>                  */
> -               u64 offset = (instrs_executed - instrs_over - 1);
> -               u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> -                                             tidq->packet, offset);
> +               u64 instrs_over = instrs_executed;
>
> -               ret = cs_etm__synth_instruction_sample(
> -                       etmq, tidq, addr, etm->instructions_sample_period);
> -               if (ret)
> -                       return ret;
> +               /*
> +                * 'head' is the instructions number of the head in the new
> +                * packet, it combines with the tail of previous packet to
> +                * generate a sample.  So 'head' uses the sample period to
> +                * decrease the instruction number introduced by the previous
> +                * packet.
> +                */
> +               head = etm->instructions_sample_period -
> +                                 (tidq->period_instructions - instrs_executed);
> +
> +               if (head) {
> +                       offset = head;
> +
> +                       /*
> +                        * Calculate the address of the sampled instruction (-1
> +                        * as sample is reported as though instruction has just
> +                        * been executed, but PC has not advanced to next
> +                        * instruction)
> +                        */
> +                       addr = cs_etm__instr_addr(etmq, trace_chan_id,
> +                                                 tidq->packet, offset - 1);
> +                       ret = cs_etm__synth_instruction_sample(
> +                               etmq, tidq, addr,
> +                               etm->instructions_sample_period);
> +                       if (ret)
> +                               return ret;
> +
> +                       instrs_over -= head;
> +               }
> +
> +               while (instrs_over >= etm->instructions_sample_period) {
> +                       offset += etm->instructions_sample_period;
> +
> +                       /*
> +                        * Calculate the address of the sampled instruction (-1
> +                        * as sample is reported as though instruction has just
> +                        * been executed, but PC has not advanced to next
> +                        * instruction)
> +                        */
> +                       addr = cs_etm__instr_addr(etmq, trace_chan_id,
> +                                                 tidq->packet, offset - 1);
> +                       ret = cs_etm__synth_instruction_sample(
> +                               etmq, tidq, addr,
> +                               etm->instructions_sample_period);
> +                       if (ret)
> +                               return ret;
> +
> +                       instrs_over -= etm->instructions_sample_period;
> +               }
>
>                 /* Carry remaining instructions into next sample period */
>                 tidq->period_instructions = instrs_over;
> --
> 2.17.1
>

I believe the following change would work and make for easier reading...

.... at the start of the function remove instrs_executed and replace ....
/* get instructions remainder from previous packet */
u64 instrs_prev = tidq->period_instructions;

/* set available instructions to previous packet remainder + the
current packet count  */
tidq->period_instructions += tidq->packet->instr_count;


.... within the if(etm->sample_instructions && ...) statement I would
be more explicit what the elements of the diagram are ....

/*
 * Below diagram demonstrates the instruction samples
 * generation flows:
 *
 *    Instrs     Instrs       Instrs       Instrs
 *   Sample(n)  Sample(n+1)  Sample(n+2)  Sample(n+3)
 *    |            |            |            |
 *    V            V            V            V
 *   --------------------------------------------------
 *            ^                                  ^
 *            |                                  |
 *         Period                             Period
 *    instructions(Pi)                   instructions(Pi')
 *
 *            |                                  |
 *            \---------------- -----------------/
 *                             V
 *                      tidq->packet->instr_count;
 *
 * Instrs Sample(n...) are the synthesised samples occuring every
etm->instructions_sample_period
 * instructions - as defined on the perf command line. Sample(n) being
the last sample before the
 * current etm packet, n+1 to n+3 samples generated from the current etm packet.
 *
 * tidq->packet->instr_count represents the number of instructions in
the current etm packet.
 *
 * Period instructions (Pi) contains the the number of instructions
executed after the sample point(n)
 * from the previous etm packet. This will always be less than
etm->instructions_sample_period.
 *

.... continue with explanation here ....


.... then we can simplify the loop code removing some of the temporary
variables ....

/* get the initial offset into the current packet instructions
   (entry conditions ensure that instrs_prev < etm->instructions_sample_period)
 */
u64 offset = etm->instructions_sample_period - instrs_prev;
u64 addr;

/* Prepare last branches for instruction sample */
if (etm->synth_opts.last_branch)
    cs_etm__copy_last_branch_rb(etmq, tidq);

while (tidq->period_instructions >= etm->instructions_sample_period) {

      /*
       * Calculate the address of the sampled instruction (-1
       * as sample is reported as though instruction has just
       * been executed, but PC has not advanced to next
       * instruction)
       */
    addr = cs_etm__instr_addr(etmq, trace_chan_id, tidq->packet, offset - 1);
    ret = cs_etm__synth_instruction_sample( etmq, tidq, addr,
                etm->instructions_sample_period);
    if (ret)
        return ret;

    offset += etm->instructions_sample_period;
    tidq->period_instructions -= etm->instructions_sample_period;
}

.....
I believe the above should work, but cannot claim to have tried it
out. What do you think?

Regards

Mike

-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 1/5] perf cs-etm: Swap packets for instruction samples
  2020-02-05 15:59   ` Mike Leach
@ 2020-02-06  7:43     ` Leo Yan
  0 siblings, 0 replies; 13+ messages in thread
From: Leo Yan @ 2020-02-06  7:43 UTC (permalink / raw)
  To: Mike Leach
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Hi Mike,

On Wed, Feb 05, 2020 at 03:59:40PM +0000, Mike Leach wrote:
> Hi Leo
> 
> On Mon, 3 Feb 2020 at 01:52, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > If use option '--itrace=iNNN' with Arm CoreSight trace data, perf tool
> > fails inject instruction samples; the root cause is the packets are
> > only switched for branch samples and last branches but not for
> > instruction samples, so the new coming packets cannot be properly
> > handled for only synthesizing instruction samples.
> >
> > To fix this issue, this patch switches packets for instruction samples.
> >
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 6 ++++--
> >  1 file changed, 4 insertions(+), 2 deletions(-)
> >
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 5471045ebf5c..3dd5ba34a2c2 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1404,7 +1404,8 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >                 }
> >         }
> >
> > -       if (etm->sample_branches || etm->synth_opts.last_branch) {
> > +       if (etm->sample_branches || etm->synth_opts.last_branch ||
> > +           etm->sample_instructions) {
> >                 /*
> >                  * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> >                  * the next incoming packet.
> > @@ -1476,7 +1477,8 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
> >         }
> >
> >  swap_packet:
> > -       if (etm->sample_branches || etm->synth_opts.last_branch) {
> > +       if (etm->sample_branches || etm->synth_opts.last_branch ||
> > +           etm->sample_instructions) {
> >                 /*
> >                  * Swap PACKET with PREV_PACKET: PACKET becomes PREV_PACKET for
> >                  * the next incoming packet.
> > --
> > 2.17.1
> >
> if is worth putting the 'if <options> { swap packet }' into a separate
> function as it appears twice in identical form? Might help if more
> options for swap packet are needed later.

Makes sense.  Will factor out a new function for this.

Thanks for reviewing!
Leo

> Either way
> 
> Reviewed by: Mike Leach <mike.leach@linaro.org>
> 
> 
> -- 
> Mike Leach
> Principal Engineer, ARM Ltd.
> Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples
  2020-02-05 16:09   ` Mike Leach
@ 2020-02-06  8:24     ` Leo Yan
  0 siblings, 0 replies; 13+ messages in thread
From: Leo Yan @ 2020-02-06  8:24 UTC (permalink / raw)
  To: Mike Leach
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

On Wed, Feb 05, 2020 at 04:09:01PM +0000, Mike Leach wrote:
> Hi Leo,
> 
> There are a couple of typos in the comments below, but I also believe
> that the sample loop could be considerably simplified
> 
> On Mon, 3 Feb 2020 at 01:52, Leo Yan <leo.yan@linaro.org> wrote:
> >
> > When 'etm->instructions_sample_period' is less than
> > 'tidq->period_instructions', the function cs_etm__sample() cannot handle
> > this case properly with its logic.
> >
> > Let's see below flow as an example:
> >
> > - If we set itrace option '--itrace=i4', then function cs_etm__sample()
> >   has variables with initialized values:
> >
> >   tidq->period_instructions = 0
> >   etm->instructions_sample_period = 4
> >
> > - When the first packet is coming:
> >
> >   packet->instr_count = 10; the number of instructions executed in this
> >   packet is 10, thus update period_instructions as below:
> >
> >   tidq->period_instructions = 0 + 10 = 10
> >   instrs_over = 10 - 4 = 6
> >   offset = 10 - 6 - 1 = 3
> >   tidq->period_instructions = instrs_over = 6
> >
> > - When the second packet is coming:
> >
> >   packet->instr_count = 10; in the second pass, assume 10 instructions
> >   in the trace sample again:
> >
> >   tidq->period_instructions = 6 + 10 = 16
> >   instrs_over = 16 - 4 = 12
> >   offset = 10 - 12 - 1 = -3  -> the negative value
> >   tidq->period_instructions = instrs_over = 12
> >
> > So after handle these two packets, there have below issues:
> >
> > The first issue is that cs_etm__instr_addr() returns the address within
> > the current trace sample of the instruction related to offset, so the
> > offset is supposed to be always unsigned value.  But in fact, function
> > cs_etm__sample() might calculate a negative offset value (in handling
> > the second packet, the offset is -3) and pass to cs_etm__instr_addr()
> > with u64 type with a big positive integer.
> >
> > The second issue is it only synthesizes 2 samples for sample period = 4.
> > In theory, every packet has 10 instructions so the two packets have
> > total 20 instructions, 20 instructions should generate 5 samples
> > (4 x 5 = 20).  This is because cs_etm__sample() only calls once
> > cs_etm__synth_instruction_sample() to generate instruction sample per
> > range packet.
> >
> > This patch fixes the logic in function cs_etm__sample(); the basic
> > idea is to divide into three parts for handling coming packet:
> >
> > - The first part is for synthesizing the first instruction sample, it
> >   combines the instructions from the tail of previous packet and the
> >   instructions from the head of the new packet;
> > - The second part is to simply generate samples with sample period
> >   aligned;
> > - The third part is the tail of new packet, the rest instructions will
> >   be left for the sequential sample handling.
> >
> > Suggested-by: Mike Leach <mike.leach@linaro.org>
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/cs-etm.c | 105 ++++++++++++++++++++++++++++++++++-----
> >  1 file changed, 92 insertions(+), 13 deletions(-)
> >
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 3e28462609e7..c5a05f728eac 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1360,23 +1360,102 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >                  * TODO: allow period to be defined in cycles and clock time
> >                  */
> >
> > -               /* Get number of instructions executed after the sample point */
> > -               u64 instrs_over = tidq->period_instructions -
> > -                       etm->instructions_sample_period;
> > +               /*
> > +                * Below diagram demonstrates the instruction samples
> > +                * generation flows:
> > +                *
> > +                *    Instrs     Instrs       Instrs       Instrs
> > +                *   Sample(n)  Sample(n+1)  Sample(n+2)  Sample(n+3)
> > +                *    |            |            |            |
> > +                *    V            V            V            V
> > +                *   --------------------------------------------------
> > +                *            ^                                  ^
> > +                *            |                                  |
> > +                *         Period                             Period
> > +                *    instructions(Pi)                   instructions(Pi')
> > +                *
> > +                *            |                                  |
> > +                *            \---------------- -----------------/
> > +                *                             V
> > +                *                      instrs_executed
> > +                *
> > +                * Period instructions (Pi) contains the the number of
> > +                * instructions executed after the sample point(n).  When a new
> > +                * instruction packet is coming and generate for the next sample
> > +                * (n+1), it combines with two parts instructions, one is the
> > +                * tail of the old packet and another is the head of the new
> > +                * coming packet.  So 'head' variable is used to cauclate the
> typo : s/cauclate/calculate

Used checkpatch.pl but didn't see any complaints for this.

Thanks for pointing out and will fix it.

> > +                * instruction numbers in the new packet for sample(n+1).
> > +                *
> > +                * Sample(n+2) and sample(n+3) consume the instructions with
> > +                * sample period, so directly generate samples based on the
> > +                * sampe period.
> > +                *
> typo: s/sampe/sample

Will fix.

> > +                * After sample(n+3), the rest instructions will be used by
> > +                * later packet; so use 'instrs_over' to track the rest
> > +                * instruction number and it is assigned to
> > +                * 'tidq->period_instructions' for next round calculation.
> > +                */
> > +               u64 head, offset = 0;
> > +               u64 addr;
> >
> >                 /*
> > -                * Calculate the address of the sampled instruction (-1 as
> > -                * sample is reported as though instruction has just been
> > -                * executed, but PC has not advanced to next instruction)
> > +                * 'instrs_over' is the number of instructions executed after
> > +                * sample points, initialise it to 'instrs_executed' and will
> > +                * decrease it for consumed instructions in every synthesized
> > +                * instruction sample.
> >                  */
> > -               u64 offset = (instrs_executed - instrs_over - 1);
> > -               u64 addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > -                                             tidq->packet, offset);
> > +               u64 instrs_over = instrs_executed;
> >
> > -               ret = cs_etm__synth_instruction_sample(
> > -                       etmq, tidq, addr, etm->instructions_sample_period);
> > -               if (ret)
> > -                       return ret;
> > +               /*
> > +                * 'head' is the instructions number of the head in the new
> > +                * packet, it combines with the tail of previous packet to
> > +                * generate a sample.  So 'head' uses the sample period to
> > +                * decrease the instruction number introduced by the previous
> > +                * packet.
> > +                */
> > +               head = etm->instructions_sample_period -
> > +                                 (tidq->period_instructions - instrs_executed);
> > +
> > +               if (head) {
> > +                       offset = head;
> > +
> > +                       /*
> > +                        * Calculate the address of the sampled instruction (-1
> > +                        * as sample is reported as though instruction has just
> > +                        * been executed, but PC has not advanced to next
> > +                        * instruction)
> > +                        */
> > +                       addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > +                                                 tidq->packet, offset - 1);
> > +                       ret = cs_etm__synth_instruction_sample(
> > +                               etmq, tidq, addr,
> > +                               etm->instructions_sample_period);
> > +                       if (ret)
> > +                               return ret;
> > +
> > +                       instrs_over -= head;
> > +               }
> > +
> > +               while (instrs_over >= etm->instructions_sample_period) {
> > +                       offset += etm->instructions_sample_period;
> > +
> > +                       /*
> > +                        * Calculate the address of the sampled instruction (-1
> > +                        * as sample is reported as though instruction has just
> > +                        * been executed, but PC has not advanced to next
> > +                        * instruction)
> > +                        */
> > +                       addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > +                                                 tidq->packet, offset - 1);
> > +                       ret = cs_etm__synth_instruction_sample(
> > +                               etmq, tidq, addr,
> > +                               etm->instructions_sample_period);
> > +                       if (ret)
> > +                               return ret;
> > +
> > +                       instrs_over -= etm->instructions_sample_period;
> > +               }
> >
> >                 /* Carry remaining instructions into next sample period */
> >                 tidq->period_instructions = instrs_over;
> > --
> > 2.17.1
> >
> 
> I believe the following change would work and make for easier reading...
> 
> .... at the start of the function remove instrs_executed and replace ....
> /* get instructions remainder from previous packet */
> u64 instrs_prev = tidq->period_instructions;
> 
> /* set available instructions to previous packet remainder + the
> current packet count  */
> tidq->period_instructions += tidq->packet->instr_count;
> 
> 
> .... within the if(etm->sample_instructions && ...) statement I would
> be more explicit what the elements of the diagram are ....
> 
> /*
>  * Below diagram demonstrates the instruction samples
>  * generation flows:
>  *
>  *    Instrs     Instrs       Instrs       Instrs
>  *   Sample(n)  Sample(n+1)  Sample(n+2)  Sample(n+3)
>  *    |            |            |            |
>  *    V            V            V            V
>  *   --------------------------------------------------
>  *            ^                                  ^
>  *            |                                  |
>  *         Period                             Period
>  *    instructions(Pi)                   instructions(Pi')
>  *
>  *            |                                  |
>  *            \---------------- -----------------/
>  *                             V
>  *                      tidq->packet->instr_count;
>  *
>  * Instrs Sample(n...) are the synthesised samples occuring every
> etm->instructions_sample_period
>  * instructions - as defined on the perf command line. Sample(n) being
> the last sample before the
>  * current etm packet, n+1 to n+3 samples generated from the current etm packet.
>  *
>  * tidq->packet->instr_count represents the number of instructions in
> the current etm packet.
>  *
>  * Period instructions (Pi) contains the the number of instructions
> executed after the sample point(n)
>  * from the previous etm packet. This will always be less than
> etm->instructions_sample_period.
>  *
> 
> .... continue with explanation here ....
> 
> 
> .... then we can simplify the loop code removing some of the temporary
> variables ....
> 
> /* get the initial offset into the current packet instructions
>    (entry conditions ensure that instrs_prev < etm->instructions_sample_period)
>  */
> u64 offset = etm->instructions_sample_period - instrs_prev;
> u64 addr;
> 
> /* Prepare last branches for instruction sample */
> if (etm->synth_opts.last_branch)
>     cs_etm__copy_last_branch_rb(etmq, tidq);
> 
> while (tidq->period_instructions >= etm->instructions_sample_period) {
> 
>       /*
>        * Calculate the address of the sampled instruction (-1
>        * as sample is reported as though instruction has just
>        * been executed, but PC has not advanced to next
>        * instruction)
>        */
>     addr = cs_etm__instr_addr(etmq, trace_chan_id, tidq->packet, offset - 1);
>     ret = cs_etm__synth_instruction_sample( etmq, tidq, addr,
>                 etm->instructions_sample_period);
>     if (ret)
>         return ret;
> 
>     offset += etm->instructions_sample_period;
>     tidq->period_instructions -= etm->instructions_sample_period;
> }
> 
> .....
> I believe the above should work, but cannot claim to have tried it
> out. What do you think?

Agree.  To be honest, I considered to use your suggested way, but I
worried about the boundary conditions for 'offset', so went back to
use explict method with two code segments (head and sequential samples).

After review the suggested code, I don't find any issue.  Will refine
code as this way and give testing for it.

Very appreciate the suggestions :)

Leo

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 4/5] perf cs-etm: Optimize copying last branches
  2020-02-03  1:52 ` [PATCH v3 4/5] perf cs-etm: Optimize copying last branches Leo Yan
@ 2020-02-06 11:47   ` Mike Leach
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Leach @ 2020-02-06 11:47 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Reviewed by: Mike Leach <mike.leach@linaro.org>

On Mon, 3 Feb 2020 at 01:53, Leo Yan <leo.yan@linaro.org> wrote:
>
> If an instruction range packet can generate multiple instruction
> samples, these samples share the same last branches; it's not necessary
> to copy the same last branches repeatedly for these samples within the
> same packet.
>
> This patch moves out the last branches copying from function
> cs_etm__synth_instruction_sample(), and execute it prior to generating
> instruction samples.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 22 +++++++++++++++++-----
>  1 file changed, 17 insertions(+), 5 deletions(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index c5a05f728eac..dbddf1eec2be 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1134,10 +1134,8 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>
>         cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
>
> -       if (etm->synth_opts.last_branch) {
> -               cs_etm__copy_last_branch_rb(etmq, tidq);
> +       if (etm->synth_opts.last_branch)
>                 sample.branch_stack = tidq->last_branch;
> -       }
>
>         if (etm->synth_opts.inject) {
>                 ret = cs_etm__inject_event(event, &sample,
> @@ -1407,6 +1405,10 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>                  */
>                 u64 instrs_over = instrs_executed;
>
> +               /* Prepare last branches for instruction sample */
> +               if (etm->synth_opts.last_branch)
> +                       cs_etm__copy_last_branch_rb(etmq, tidq);
> +
>                 /*
>                  * 'head' is the instructions number of the head in the new
>                  * packet, it combines with the tail of previous packet to
> @@ -1526,6 +1528,11 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>
>         if (etmq->etm->synth_opts.last_branch &&
>             tidq->prev_packet->sample_type == CS_ETM_RANGE) {
> +               u64 addr;
> +
> +               /* Prepare last branches for instruction sample */
> +               cs_etm__copy_last_branch_rb(etmq, tidq);
> +
>                 /*
>                  * Generate a last branch event for the branches left in the
>                  * circular buffer at the end of the trace.
> @@ -1533,7 +1540,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>                  * Use the address of the end of the last reported execution
>                  * range
>                  */
> -               u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
> +               addr = cs_etm__last_executed_instr(tidq->prev_packet);
>
>                 err = cs_etm__synth_instruction_sample(
>                         etmq, tidq, addr,
> @@ -1587,11 +1594,16 @@ static int cs_etm__end_block(struct cs_etm_queue *etmq,
>          */
>         if (etmq->etm->synth_opts.last_branch &&
>             tidq->prev_packet->sample_type == CS_ETM_RANGE) {
> +               u64 addr;
> +
> +               /* Prepare last branches for instruction sample */
> +               cs_etm__copy_last_branch_rb(etmq, tidq);
> +
>                 /*
>                  * Use the address of the end of the last reported execution
>                  * range.
>                  */
> -               u64 addr = cs_etm__last_executed_instr(tidq->prev_packet);
> +               addr = cs_etm__last_executed_instr(tidq->prev_packet);
>
>                 err = cs_etm__synth_instruction_sample(
>                         etmq, tidq, addr,
> --
> 2.17.1
>


-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v3 5/5] perf cs-etm: Fix unsigned variable comparison to zero
  2020-02-03  1:52 ` [PATCH v3 5/5] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
@ 2020-02-06 11:48   ` Mike Leach
  0 siblings, 0 replies; 13+ messages in thread
From: Mike Leach @ 2020-02-06 11:48 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Peter Zijlstra, Ingo Molnar, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-kernel,
	Robert Walker, Coresight ML

Reviewed by: Mike Leach <mike.leach@linaro.org>

On Mon, 3 Feb 2020 at 01:53, Leo Yan <leo.yan@linaro.org> wrote:
>
> The variable 'offset' in function cs_etm__sample() is u64 type, it's not
> appropriate to check it with 'while (offset > 0)'; this patch changes to
> 'while (offset)'.
>
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/cs-etm.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index dbddf1eec2be..720108bd8dba 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -945,7 +945,7 @@ static inline u64 cs_etm__instr_addr(struct cs_etm_queue *etmq,
>         if (packet->isa == CS_ETM_ISA_T32) {
>                 u64 addr = packet->start_addr;
>
> -               while (offset > 0) {
> +               while (offset) {
>                         addr += cs_etm__t32_instr_size(etmq,
>                                                        trace_chan_id, addr);
>                         offset--;
> --
> 2.17.1
>


-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2020-02-06 11:48 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-02-03  1:51 [PATCH v3 0/5] perf cs-etm: Fix synthesizing instruction samples Leo Yan
2020-02-03  1:51 ` [PATCH v3 1/5] perf cs-etm: Swap packets for " Leo Yan
2020-02-05 15:59   ` Mike Leach
2020-02-06  7:43     ` Leo Yan
2020-02-03  1:52 ` [PATCH v3 2/5] perf cs-etm: Continuously record last branch Leo Yan
2020-02-05 16:01   ` Mike Leach
2020-02-03  1:52 ` [PATCH v3 3/5] perf cs-etm: Correct synthesizing instruction samples Leo Yan
2020-02-05 16:09   ` Mike Leach
2020-02-06  8:24     ` Leo Yan
2020-02-03  1:52 ` [PATCH v3 4/5] perf cs-etm: Optimize copying last branches Leo Yan
2020-02-06 11:47   ` Mike Leach
2020-02-03  1:52 ` [PATCH v3 5/5] perf cs-etm: Fix unsigned variable comparison to zero Leo Yan
2020-02-06 11:48   ` Mike Leach

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).