linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/1] Fix Coresight instruction synthesis logic
@ 2023-06-23 18:22 Tanmay Jagdale
  2023-06-23 18:22 ` [PATCH 1/1] perf: cs-etm: Fixes in instruction sample synthesis Tanmay Jagdale
  0 siblings, 1 reply; 5+ messages in thread
From: Tanmay Jagdale @ 2023-06-23 18:22 UTC (permalink / raw)
  To: john.g.garry, will, mike.leach, suzuki.poulose, peterz, mingo,
	acme, mark.rutland, irogers, adrian.hunter
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, sgoutham,
	gcherian, lcherian, areddy3, Tanmay Jagdale

When we use perf to catpure Coresight trace and generate instruction
trace using 'perf script', we get the following output:

# perf record -e cs_etm/@tmc_etr0/ -C 9 taskset -c 9 sleep 1
# perf script --itrace=i1ns --ns -Fcomm,tid,pid,time,cpu,event,ip,sym,addr,symoff,flags,callindent
...
 perf  9024/9024  [009]  2690.650470551:      instructions:   call                                 0 ffffb305591aed54 coresight_timeout+0x28
 perf  9024/9024  [009]  2690.650470551:      instructions:   call                                 0 ffffb305591aed58 coresight_timeout+0x2c
 perf  9024/9024  [009]  2690.650470551:      instructions:   call                                 0 ffffb305591aed5c coresight_timeout+0x30
 perf  9024/9024  [009]  2690.650470551:      instructions:   call                                 0 ffffb305591aed60 coresight_timeout+0x34
 perf  9024/9024  [009]  2690.650470551:      instructions:   jmp                                  0 ffffb305591aed7c coresight_timeout+0x50
 perf  9024/9024  [009]  2690.650470551:      instructions:   jmp                                  0 ffffb305591aed80 coresight_timeout+0x54
 perf  9024/9024  [009]  2690.650470551:      instructions:   jmp                                  0 ffffb305591aed84 coresight_timeout+0x58
 perf  9024/9024  [009]  2690.650470552:      instructions:   jcc                                  0 ffffb305591aede4 coresight_timeout+0xb8
 perf  9024/9024  [009]  2690.650470552:      instructions:   jcc                                  0 ffffb305591aede8 coresight_timeout+0xbc
 perf  9024/9024  [009]  2690.650470552:      instructions:   jcc                                  0 ffffb305591aedec coresight_timeout+0xc0
 perf  9024/9024  [009]  2690.650470552:      instructions:   jcc                                  0 ffffb305591aedf0 coresight_timeout+0xc4
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccec ete_sysreg_read+0x0
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccf0 ete_sysreg_read+0x4
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccf4 ete_sysreg_read+0x8
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccf8 ete_sysreg_read+0xc
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bccfc ete_sysreg_read+0x10
 perf  9024/9024  [009]  2690.650470557:      instructions:   call                                 0 ffffb305591bcd00 ete_sysreg_read+0x14

This output has the following issues:
1. Non-branch instructions have mnemonics of branch instructions (Column 6)
2. Branch target address is missing (Column 7)

This patch fixes these issues by changing the logic of instruction syntehsis
for the Coresight trace queues.

Output after applying the patch:
 ...
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed54 coresight_timeout+0x28
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed58 coresight_timeout+0x2c
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed5c coresight_timeout+0x30
 perf  6111/6111  [008]   457.332794461:      instructions:   jmp                   ffffb305591aed7c ffffb305591aed60 coresight_timeout+0x34
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed7c coresight_timeout+0x50
 perf  6111/6111  [008]   457.332794461:      instructions:                                        0 ffffb305591aed80 coresight_timeout+0x54
 perf  6111/6111  [008]   457.332794461:      instructions:   jcc                   ffffb305591aede4 ffffb305591aed84 coresight_timeout+0x58
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591aede4 coresight_timeout+0xb8
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591aede8 coresight_timeout+0xbc
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591aedec coresight_timeout+0xc0
 perf  6111/6111  [008]   457.332794462:      instructions:   call                  ffffb305591bccec ffffb305591aedf0 coresight_timeout+0xc4
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccec ete_sysreg_read+0x0
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccf0 ete_sysreg_read+0x4
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccf4 ete_sysreg_read+0x8
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccf8 ete_sysreg_read+0xc
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bccfc ete_sysreg_read+0x10
 perf  6111/6111  [008]   457.332794462:      instructions:                                        0 ffffb305591bcd00 ete_sysreg_read+0x14

Tanmay Jagdale (1):
  perf: cs-etm: Fixes in instruction sample synthesis

 tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

-- 
2.34.1


^ permalink raw reply	[flat|nested] 5+ messages in thread

* [PATCH 1/1] perf: cs-etm: Fixes in instruction sample synthesis
  2023-06-23 18:22 [PATCH 0/1] Fix Coresight instruction synthesis logic Tanmay Jagdale
@ 2023-06-23 18:22 ` Tanmay Jagdale
  2023-06-26 10:20   ` James Clark
  0 siblings, 1 reply; 5+ messages in thread
From: Tanmay Jagdale @ 2023-06-23 18:22 UTC (permalink / raw)
  To: john.g.garry, will, mike.leach, suzuki.poulose, peterz, mingo,
	acme, mark.rutland, irogers, adrian.hunter
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, sgoutham,
	gcherian, lcherian, areddy3, Tanmay Jagdale

The existing method of synthesizing instruction samples has the
following issues:
1. Non-branch instructions have mnemonics of branch instructions.
2. Branch target address is missing.

Set the sample flags only when we reach the last instruction in
the tidq (which would be a branch instruction) to solve issue 1).

To fix issue 2), start synthesizing the instructions from the
previous packet (tidq->prev_packet) instead of current packet
(tidq->packet). This way, it is easy to figure out the target
address of the branch instruction in tidq->prev_packet which
is the current packet's (tidq->packet) first executed instruction.

After the switch to processing the previous packet first, we no
longer need to swap the packets during cs_etm__flush().

Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
---
 tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
 1 file changed, 25 insertions(+), 7 deletions(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 91299cc56bf7..446e00d98fd5 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -1418,10 +1418,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
 	sample.stream_id = etmq->etm->instructions_id;
 	sample.period = period;
 	sample.cpu = tidq->packet->cpu;
-	sample.flags = tidq->prev_packet->flags;
 	sample.cpumode = event->sample.header.misc;
 
-	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
+	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample);
+
+	/* Populate branch target information only when we encounter
+	 * branch instruction, which is at the end of tidq->prev_packet.
+	 */
+	if (addr == (tidq->prev_packet->end_addr - 4)) {
+		/* Update the perf_sample flags using the prev_packet
+		 * since that is the queue we are synthesizing.
+		 */
+		sample.flags = tidq->prev_packet->flags;
+
+		/* The last instruction of the previous queue would be a
+		 * branch operation. Get the target of that branch by looking
+		 * into the first executed instruction of the current packet
+		 * queue.
+		 */
+		sample.addr = cs_etm__first_executed_instr(tidq->packet);
+	}
 
 	if (etm->synth_opts.last_branch)
 		sample.branch_stack = tidq->last_branch;
@@ -1641,7 +1657,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 	/* Get instructions remainder from previous packet */
 	instrs_prev = tidq->period_instructions;
 
-	tidq->period_instructions += tidq->packet->instr_count;
+	tidq->period_instructions += tidq->prev_packet->instr_count;
 
 	/*
 	 * Record a branch when the last instruction in
@@ -1721,8 +1737,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
 			 * been executed, but PC has not advanced to next
 			 * instruction)
 			 */
+			/* Get address from prev_packet since we are synthesizing
+			 * that in cs_etm__synth_instruction_sample()
+			 */
 			addr = cs_etm__instr_addr(etmq, trace_chan_id,
-						  tidq->packet, offset - 1);
+						  tidq->prev_packet, offset - 1);
 			ret = cs_etm__synth_instruction_sample(
 				etmq, tidq, addr,
 				etm->instructions_sample_period);
@@ -1786,7 +1805,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 
 	/* Handle start tracing packet */
 	if (tidq->prev_packet->sample_type == CS_ETM_EMPTY)
-		goto swap_packet;
+		goto reset_last_br;
 
 	if (etmq->etm->synth_opts.last_branch &&
 	    etmq->etm->synth_opts.instructions &&
@@ -1822,8 +1841,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
 			return err;
 	}
 
-swap_packet:
-	cs_etm__packet_swap(etm, tidq);
+reset_last_br:
 
 	/* Reset last branches after flush the trace */
 	if (etm->synth_opts.last_branch)
-- 
2.34.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] perf: cs-etm: Fixes in instruction sample synthesis
  2023-06-23 18:22 ` [PATCH 1/1] perf: cs-etm: Fixes in instruction sample synthesis Tanmay Jagdale
@ 2023-06-26 10:20   ` James Clark
  2023-06-28  8:01     ` Tanmay Jagdale
  0 siblings, 1 reply; 5+ messages in thread
From: James Clark @ 2023-06-26 10:20 UTC (permalink / raw)
  To: Tanmay Jagdale
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel, sgoutham,
	gcherian, lcherian, areddy3, john.g.garry, will, mike.leach,
	suzuki.poulose, peterz, mingo, acme, mark.rutland, irogers,
	adrian.hunter, coresight



On 23/06/2023 19:22, Tanmay Jagdale wrote:
> The existing method of synthesizing instruction samples has the
> following issues:
> 1. Non-branch instructions have mnemonics of branch instructions.
> 2. Branch target address is missing.
> 
> Set the sample flags only when we reach the last instruction in
> the tidq (which would be a branch instruction) to solve issue 1).
> 
> To fix issue 2), start synthesizing the instructions from the
> previous packet (tidq->prev_packet) instead of current packet
> (tidq->packet). This way, it is easy to figure out the target
> address of the branch instruction in tidq->prev_packet which
> is the current packet's (tidq->packet) first executed instruction.
> 
> After the switch to processing the previous packet first, we no
> longer need to swap the packets during cs_etm__flush().

Hi Tanmay,

I think the fix for setting the right flags and instruction type makes
sense, but is it possible to do it without the change to swapping in
cs_etm__flush() or some of the other changes to
cs_etm__synth_instruction_sample()?

I'm seeing some differences in the output related to the PID that's
assigned to a sample and some of the addresses that aren't explained by
the commit message. Also there is no corresponding change to
cs_etm__synth_branch_sample(), which is also using prev_packet etc so
I'm wondering if that's correct now without the swap? That function gets
used with the default itrace options or itrace=b

For example if I run 'perf script --itrace=i100ns' and diff the output
before and after your change I see a difference even though branch and
instruction info isn't printed, so I wouldn't expect to see any changes.
This is on a systemwide recording of a system under load.

Thanks
James

> 
> Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
> ---
>  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
>  1 file changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 91299cc56bf7..446e00d98fd5 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -1418,10 +1418,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>  	sample.stream_id = etmq->etm->instructions_id;
>  	sample.period = period;
>  	sample.cpu = tidq->packet->cpu;
> -	sample.flags = tidq->prev_packet->flags;
>  	sample.cpumode = event->sample.header.misc;
>  
> -	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
> +	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample);
> +
> +	/* Populate branch target information only when we encounter
> +	 * branch instruction, which is at the end of tidq->prev_packet.
> +	 */
> +	if (addr == (tidq->prev_packet->end_addr - 4)) {
> +		/* Update the perf_sample flags using the prev_packet
> +		 * since that is the queue we are synthesizing.
> +		 */
> +		sample.flags = tidq->prev_packet->flags;
> +
> +		/* The last instruction of the previous queue would be a
> +		 * branch operation. Get the target of that branch by looking
> +		 * into the first executed instruction of the current packet
> +		 * queue.
> +		 */
> +		sample.addr = cs_etm__first_executed_instr(tidq->packet);
> +	}
>  
>  	if (etm->synth_opts.last_branch)
>  		sample.branch_stack = tidq->last_branch;
> @@ -1641,7 +1657,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>  	/* Get instructions remainder from previous packet */
>  	instrs_prev = tidq->period_instructions;
>  
> -	tidq->period_instructions += tidq->packet->instr_count;
> +	tidq->period_instructions += tidq->prev_packet->instr_count;
>  
>  	/*
>  	 * Record a branch when the last instruction in
> @@ -1721,8 +1737,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>  			 * been executed, but PC has not advanced to next
>  			 * instruction)
>  			 */
> +			/* Get address from prev_packet since we are synthesizing
> +			 * that in cs_etm__synth_instruction_sample()
> +			 */
>  			addr = cs_etm__instr_addr(etmq, trace_chan_id,
> -						  tidq->packet, offset - 1);
> +						  tidq->prev_packet, offset - 1);
>  			ret = cs_etm__synth_instruction_sample(
>  				etmq, tidq, addr,
>  				etm->instructions_sample_period);
> @@ -1786,7 +1805,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  
>  	/* Handle start tracing packet */
>  	if (tidq->prev_packet->sample_type == CS_ETM_EMPTY)
> -		goto swap_packet;
> +		goto reset_last_br;
>  
>  	if (etmq->etm->synth_opts.last_branch &&
>  	    etmq->etm->synth_opts.instructions &&
> @@ -1822,8 +1841,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>  			return err;
>  	}
>  
> -swap_packet:
> -	cs_etm__packet_swap(etm, tidq);
> +reset_last_br:
>  
>  	/* Reset last branches after flush the trace */
>  	if (etm->synth_opts.last_branch)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] perf: cs-etm: Fixes in instruction sample synthesis
  2023-06-26 10:20   ` James Clark
@ 2023-06-28  8:01     ` Tanmay Jagdale
  2023-06-28 13:27       ` James Clark
  0 siblings, 1 reply; 5+ messages in thread
From: Tanmay Jagdale @ 2023-06-28  8:01 UTC (permalink / raw)
  To: James Clark
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel,
	Sunil Kovvuri Goutham, George Cherian, Linu Cherian,
	Anil Kumar Reddy H, john.g.garry, will, mike.leach,
	suzuki.poulose, peterz, mingo, acme, mark.rutland, irogers,
	adrian.hunter, coresight

Hi James,

> On 23/06/2023 19:22, Tanmay Jagdale wrote:
> > The existing method of synthesizing instruction samples has the
> > following issues:
> > 1. Non-branch instructions have mnemonics of branch instructions.
> > 2. Branch target address is missing.
> >
> > Set the sample flags only when we reach the last instruction in
> > the tidq (which would be a branch instruction) to solve issue 1).
> >
> > To fix issue 2), start synthesizing the instructions from the
> > previous packet (tidq->prev_packet) instead of current packet
> > (tidq->packet). This way, it is easy to figure out the target
> > address of the branch instruction in tidq->prev_packet which
> > is the current packet's (tidq->packet) first executed instruction.
> >
> > After the switch to processing the previous packet first, we no
> > longer need to swap the packets during cs_etm__flush().
> 
> Hi Tanmay,
> 
> I think the fix for setting the right flags and instruction type makes
> sense, but is it possible to do it without the change to swapping in
> cs_etm__flush() or some of the other changes to
> cs_etm__synth_instruction_sample()?
Thanks for the review. I took this approach of swapping because
it would be easy to figure out the target address of the branch.
If we don't swap the packet synthesis, then we must decode the
actual instruction to get the branch target address.
IMHO, this would be a complex change.

Since the swapping approach is meant to solve issue 2) I will
split the patch while posting the next version.

> 
> I'm seeing some differences in the output related to the PID that's
> assigned to a sample and some of the addresses that aren't explained by
> the commit message. Also there is no corresponding change to
> cs_etm__synth_branch_sample(), which is also using prev_packet etc so
> I'm wondering if that's correct now without the swap? That function gets
> used with the default itrace options or itrace=b
IMHO the existing way to handle itrace=b or itrace is correct and
does not need any change since we are interested in only the last and
first instruction from tidq->prev_packet and tidq->packet respectively,
to generate the branching information.

> 
> For example if I run 'perf script --itrace=i100ns' and diff the output
> before and after your change I see a difference even though branch and
> instruction info isn't printed, so I wouldn't expect to see any changes.
> This is on a systemwide recording of a system under load.
Yes, the "tr start" mnemonic isn't printed in the flags column, but
rest of columns are the same. Will fix this in the next version.

If you have observed any other differences, can you please share them ?

Also, can you please share the test case commands so that I can run them
before submitting the next version ?

Thanks and regards,
Tanmay
> 
> Thanks
> James
> 
> >
> > Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
> > ---
> >  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
> >  1 file changed, 25 insertions(+), 7 deletions(-)
> >
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 91299cc56bf7..446e00d98fd5 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -1418,10 +1418,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
> >  	sample.stream_id = etmq->etm->instructions_id;
> >  	sample.period = period;
> >  	sample.cpu = tidq->packet->cpu;
> > -	sample.flags = tidq->prev_packet->flags;
> >  	sample.cpumode = event->sample.header.misc;
> >
> > -	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
> > +	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample);
> > +
> > +	/* Populate branch target information only when we encounter
> > +	 * branch instruction, which is at the end of tidq->prev_packet.
> > +	 */
> > +	if (addr == (tidq->prev_packet->end_addr - 4)) {
> > +		/* Update the perf_sample flags using the prev_packet
> > +		 * since that is the queue we are synthesizing.
> > +		 */
> > +		sample.flags = tidq->prev_packet->flags;
> > +
> > +		/* The last instruction of the previous queue would be a
> > +		 * branch operation. Get the target of that branch by looking
> > +		 * into the first executed instruction of the current packet
> > +		 * queue.
> > +		 */
> > +		sample.addr = cs_etm__first_executed_instr(tidq->packet);
> > +	}
> >
> >  	if (etm->synth_opts.last_branch)
> >  		sample.branch_stack = tidq->last_branch;
> > @@ -1641,7 +1657,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  	/* Get instructions remainder from previous packet */
> >  	instrs_prev = tidq->period_instructions;
> >
> > -	tidq->period_instructions += tidq->packet->instr_count;
> > +	tidq->period_instructions += tidq->prev_packet->instr_count;
> >
> >  	/*
> >  	 * Record a branch when the last instruction in
> > @@ -1721,8 +1737,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
> >  			 * been executed, but PC has not advanced to next
> >  			 * instruction)
> >  			 */
> > +			/* Get address from prev_packet since we are synthesizing
> > +			 * that in cs_etm__synth_instruction_sample()
> > +			 */
> >  			addr = cs_etm__instr_addr(etmq, trace_chan_id,
> > -						  tidq->packet, offset - 1);
> > +						  tidq->prev_packet, offset - 1);
> >  			ret = cs_etm__synth_instruction_sample(
> >  				etmq, tidq, addr,
> >  				etm->instructions_sample_period);
> > @@ -1786,7 +1805,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
> >
> >  	/* Handle start tracing packet */
> >  	if (tidq->prev_packet->sample_type == CS_ETM_EMPTY)
> > -		goto swap_packet;
> > +		goto reset_last_br;
> >
> >  	if (etmq->etm->synth_opts.last_branch &&
> >  	    etmq->etm->synth_opts.instructions &&
> > @@ -1822,8 +1841,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
> >  			return err;
> >  	}
> >
> > -swap_packet:
> > -	cs_etm__packet_swap(etm, tidq);
> > +reset_last_br:
> >
> >  	/* Reset last branches after flush the trace */
> >  	if (etm->synth_opts.last_branch)

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH 1/1] perf: cs-etm: Fixes in instruction sample synthesis
  2023-06-28  8:01     ` Tanmay Jagdale
@ 2023-06-28 13:27       ` James Clark
  0 siblings, 0 replies; 5+ messages in thread
From: James Clark @ 2023-06-28 13:27 UTC (permalink / raw)
  To: Tanmay Jagdale
  Cc: linux-arm-kernel, linux-perf-users, linux-kernel,
	Sunil Kovvuri Goutham, George Cherian, Linu Cherian,
	Anil Kumar Reddy H, john.g.garry, will, mike.leach,
	suzuki.poulose, peterz, mingo, acme, mark.rutland, irogers,
	adrian.hunter, coresight



On 28/06/2023 09:01, Tanmay Jagdale wrote:
> Hi James,
> 
>> On 23/06/2023 19:22, Tanmay Jagdale wrote:
>>> The existing method of synthesizing instruction samples has the
>>> following issues:
>>> 1. Non-branch instructions have mnemonics of branch instructions.
>>> 2. Branch target address is missing.
>>>
>>> Set the sample flags only when we reach the last instruction in
>>> the tidq (which would be a branch instruction) to solve issue 1).
>>>
>>> To fix issue 2), start synthesizing the instructions from the
>>> previous packet (tidq->prev_packet) instead of current packet
>>> (tidq->packet). This way, it is easy to figure out the target
>>> address of the branch instruction in tidq->prev_packet which
>>> is the current packet's (tidq->packet) first executed instruction.
>>>
>>> After the switch to processing the previous packet first, we no
>>> longer need to swap the packets during cs_etm__flush().
>>
>> Hi Tanmay,
>>
>> I think the fix for setting the right flags and instruction type makes
>> sense, but is it possible to do it without the change to swapping in
>> cs_etm__flush() or some of the other changes to
>> cs_etm__synth_instruction_sample()?
> Thanks for the review. I took this approach of swapping because
> it would be easy to figure out the target address of the branch.
> If we don't swap the packet synthesis, then we must decode the
> actual instruction to get the branch target address.
> IMHO, this would be a complex change.
> 
> Since the swapping approach is meant to solve issue 2) I will
> split the patch while posting the next version.
> 
>>
>> I'm seeing some differences in the output related to the PID that's
>> assigned to a sample and some of the addresses that aren't explained by
>> the commit message. Also there is no corresponding change to
>> cs_etm__synth_branch_sample(), which is also using prev_packet etc so
>> I'm wondering if that's correct now without the swap? That function gets
>> used with the default itrace options or itrace=b
> IMHO the existing way to handle itrace=b or itrace is correct and
> does not need any change since we are interested in only the last and
> first instruction from tidq->prev_packet and tidq->packet respectively,
> to generate the branching information.
> 
>>
>> For example if I run 'perf script --itrace=i100ns' and diff the output
>> before and after your change I see a difference even though branch and
>> instruction info isn't printed, so I wouldn't expect to see any changes.
>> This is on a systemwide recording of a system under load.
> Yes, the "tr start" mnemonic isn't printed in the flags column, but
> rest of columns are the same. Will fix this in the next version.
> 
> If you have observed any other differences, can you please share them ?
> 
> Also, can you please share the test case commands so that I can run them
> before submitting the next version ?

It's just if you diff any existing recording it seems that there are
differences. In this case below I see a difference in the ordering of
the samples generated. In a more complicated case with a VM running I
also see a difference in which PID is assigned to some samples. It's
probably not related to the VM but just that there was more going on on
the machine.

It's not necessarily wrong, but we don't currently have any tests that
verify the complete correctness of the decoding, so unless the commit
message explains why there should be a difference we shouldn't make any
changes.

  stress -c 4 &
  sudo perf record -e cs_etm// -a -m,16M -- sleep 3

  sudo perf-before-change script --itrace=i1000i > before
  sudo perf-after-change script --itrace=i1000i > after

  diff before after

  29d28
  <           stress    8198 [000]  1096.381602:       1000
instructions:      aaaaceb70f2c rand@plt+0xc (/usr/bin/stress)
  30a30
  >           stress    8198 [000]  1096.381602:       1000
instructions:      aaaaceb70f2c rand@plt+0xc (/usr/bin/stress)
  193d192
  <           stress    8200 [001]  1096.381602:       1000
instructions:      aaaaceb70f28 rand@plt+0x8 (/usr/bin/stress)
  194a194
  >           stress    8200 [001]  1096.381602:       1000
instructions:      aaaaceb70f28 rand@plt+0x8 (/usr/bin/stress)



> 
> Thanks and regards,
> Tanmay
>>
>> Thanks
>> James
>>
>>>
>>> Signed-off-by: Tanmay Jagdale <tanmay@marvell.com>
>>> ---
>>>  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++-------
>>>  1 file changed, 25 insertions(+), 7 deletions(-)
>>>
>>> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
>>> index 91299cc56bf7..446e00d98fd5 100644
>>> --- a/tools/perf/util/cs-etm.c
>>> +++ b/tools/perf/util/cs-etm.c
>>> @@ -1418,10 +1418,26 @@ static int cs_etm__synth_instruction_sample(struct cs_etm_queue *etmq,
>>>  	sample.stream_id = etmq->etm->instructions_id;
>>>  	sample.period = period;
>>>  	sample.cpu = tidq->packet->cpu;
>>> -	sample.flags = tidq->prev_packet->flags;
>>>  	sample.cpumode = event->sample.header.misc;
>>>
>>> -	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->packet, &sample);
>>> +	cs_etm__copy_insn(etmq, tidq->trace_chan_id, tidq->prev_packet, &sample);
>>> +
>>> +	/* Populate branch target information only when we encounter
>>> +	 * branch instruction, which is at the end of tidq->prev_packet.
>>> +	 */
>>> +	if (addr == (tidq->prev_packet->end_addr - 4)) {
>>> +		/* Update the perf_sample flags using the prev_packet
>>> +		 * since that is the queue we are synthesizing.
>>> +		 */
>>> +		sample.flags = tidq->prev_packet->flags;
>>> +
>>> +		/* The last instruction of the previous queue would be a
>>> +		 * branch operation. Get the target of that branch by looking
>>> +		 * into the first executed instruction of the current packet
>>> +		 * queue.
>>> +		 */
>>> +		sample.addr = cs_etm__first_executed_instr(tidq->packet);
>>> +	}
>>>
>>>  	if (etm->synth_opts.last_branch)
>>>  		sample.branch_stack = tidq->last_branch;
>>> @@ -1641,7 +1657,7 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>>>  	/* Get instructions remainder from previous packet */
>>>  	instrs_prev = tidq->period_instructions;
>>>
>>> -	tidq->period_instructions += tidq->packet->instr_count;
>>> +	tidq->period_instructions += tidq->prev_packet->instr_count;
>>>
>>>  	/*
>>>  	 * Record a branch when the last instruction in
>>> @@ -1721,8 +1737,11 @@ static int cs_etm__sample(struct cs_etm_queue *etmq,
>>>  			 * been executed, but PC has not advanced to next
>>>  			 * instruction)
>>>  			 */
>>> +			/* Get address from prev_packet since we are synthesizing
>>> +			 * that in cs_etm__synth_instruction_sample()
>>> +			 */
>>>  			addr = cs_etm__instr_addr(etmq, trace_chan_id,
>>> -						  tidq->packet, offset - 1);
>>> +						  tidq->prev_packet, offset - 1);
>>>  			ret = cs_etm__synth_instruction_sample(
>>>  				etmq, tidq, addr,
>>>  				etm->instructions_sample_period);
>>> @@ -1786,7 +1805,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>>>
>>>  	/* Handle start tracing packet */
>>>  	if (tidq->prev_packet->sample_type == CS_ETM_EMPTY)
>>> -		goto swap_packet;
>>> +		goto reset_last_br;
>>>
>>>  	if (etmq->etm->synth_opts.last_branch &&
>>>  	    etmq->etm->synth_opts.instructions &&
>>> @@ -1822,8 +1841,7 @@ static int cs_etm__flush(struct cs_etm_queue *etmq,
>>>  			return err;
>>>  	}
>>>
>>> -swap_packet:
>>> -	cs_etm__packet_swap(etm, tidq);
>>> +reset_last_br:
>>>
>>>  	/* Reset last branches after flush the trace */
>>>  	if (etm->synth_opts.last_branch)

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2023-06-28 13:29 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-23 18:22 [PATCH 0/1] Fix Coresight instruction synthesis logic Tanmay Jagdale
2023-06-23 18:22 ` [PATCH 1/1] perf: cs-etm: Fixes in instruction sample synthesis Tanmay Jagdale
2023-06-26 10:20   ` James Clark
2023-06-28  8:01     ` Tanmay Jagdale
2023-06-28 13:27       ` James Clark

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).