All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf tools: Add SPE total latency as PERF_SAMPLE_WEIGHT
@ 2021-12-01  0:39 Namhyung Kim
  2021-12-01  5:08 ` Leo Yan
  0 siblings, 1 reply; 3+ messages in thread
From: Namhyung Kim @ 2021-12-01  0:39 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa
  Cc: Ingo Molnar, Peter Zijlstra, LKML, Andi Kleen, Ian Rogers,
	Stephane Eranian, German Gomez, Leo Yan, Mark Rutland

Use total latency info in the SPE counter packet as sample weight so
that we can see it in local_weight and (global) weight sort keys.

Maybe we can use PERF_SAMPLE_WEIGHT_STRUCT to support ins_lat as well
but I'm not sure which latency it matches.  So just adding total
latency first.

Cc: German Gomez <german.gomez@arm.com>
Cc: Leo Yan <leo.yan@linaro.org>
Cc: Mark Rutland <mark.rutland@arm.com>
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 2 ++
 tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 1 +
 tools/perf/util/arm-spe.c                         | 4 +++-
 3 files changed, 6 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
index 3fc528c9270c..5e390a1a79ab 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
@@ -179,6 +179,8 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
 				decoder->record.phys_addr = ip;
 			break;
 		case ARM_SPE_COUNTER:
+			if (idx == SPE_CNT_PKT_HDR_INDEX_TOTAL_LAT)
+				decoder->record.latency = payload;
 			break;
 		case ARM_SPE_CONTEXT:
 			decoder->record.context_id = payload;
diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
index 46a8556a9e95..69b31084d6be 100644
--- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
+++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
@@ -33,6 +33,7 @@ struct arm_spe_record {
 	enum arm_spe_sample_type type;
 	int err;
 	u32 op;
+	u32 latency;
 	u64 from_ip;
 	u64 to_ip;
 	u64 timestamp;
diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
index 4748bcfe61de..a756325c72a7 100644
--- a/tools/perf/util/arm-spe.c
+++ b/tools/perf/util/arm-spe.c
@@ -317,6 +317,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
 	sample.addr = record->virt_addr;
 	sample.phys_addr = record->phys_addr;
 	sample.data_src = data_src;
+	sample.weight = record->latency;
 
 	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
 }
@@ -980,7 +981,8 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
 	attr.type = PERF_TYPE_HARDWARE;
 	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
 	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
-			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC;
+			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC |
+			    PERF_SAMPLE_WEIGHT;
 	if (spe->timeless_decoding)
 		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
 	else
-- 
2.34.0.rc2.393.gf8c9666880-goog


^ permalink raw reply related	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf tools: Add SPE total latency as PERF_SAMPLE_WEIGHT
  2021-12-01  0:39 [PATCH] perf tools: Add SPE total latency as PERF_SAMPLE_WEIGHT Namhyung Kim
@ 2021-12-01  5:08 ` Leo Yan
  2021-12-01 17:11   ` Namhyung Kim
  0 siblings, 1 reply; 3+ messages in thread
From: Leo Yan @ 2021-12-01  5:08 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Andi Kleen, Ian Rogers, Stephane Eranian, German Gomez,
	Mark Rutland, James Clark

Hi Namhyung,

On Tue, Nov 30, 2021 at 04:39:08PM -0800, Namhyung Kim wrote:
> Use total latency info in the SPE counter packet as sample weight so
> that we can see it in local_weight and (global) weight sort keys.
> 
> Maybe we can use PERF_SAMPLE_WEIGHT_STRUCT to support ins_lat as well
> but I'm not sure which latency it matches.  So just adding total
> latency first.
> 
> Cc: German Gomez <german.gomez@arm.com>
> Cc: Leo Yan <leo.yan@linaro.org>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 2 ++
>  tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 1 +
>  tools/perf/util/arm-spe.c                         | 4 +++-
>  3 files changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> index 3fc528c9270c..5e390a1a79ab 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> @@ -179,6 +179,8 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
>  				decoder->record.phys_addr = ip;
>  			break;
>  		case ARM_SPE_COUNTER:
> +			if (idx == SPE_CNT_PKT_HDR_INDEX_TOTAL_LAT)
> +				decoder->record.latency = payload;
>  			break;
>  		case ARM_SPE_CONTEXT:
>  			decoder->record.context_id = payload;
> diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> index 46a8556a9e95..69b31084d6be 100644
> --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> @@ -33,6 +33,7 @@ struct arm_spe_record {
>  	enum arm_spe_sample_type type;
>  	int err;
>  	u32 op;
> +	u32 latency;
>  	u64 from_ip;
>  	u64 to_ip;
>  	u64 timestamp;
> diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> index 4748bcfe61de..a756325c72a7 100644
> --- a/tools/perf/util/arm-spe.c
> +++ b/tools/perf/util/arm-spe.c
> @@ -317,6 +317,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
>  	sample.addr = record->virt_addr;
>  	sample.phys_addr = record->phys_addr;
>  	sample.data_src = data_src;
> +	sample.weight = record->latency;

The latency can be used for branch operations as well, it's good to
assign latency for branch samples in the function
arm_spe__synth_branch_sample().

With adding latency for branch sample, the change would be good for me:

Reviewed-by: Leo Yan <leo.yan@linaro.org>

>  	return arm_spe_deliver_synth_event(spe, speq, event, &sample);
>  }
> @@ -980,7 +981,8 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
>  	attr.type = PERF_TYPE_HARDWARE;
>  	attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
>  	attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
> -			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC;
> +			    PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC |
> +			    PERF_SAMPLE_WEIGHT;
>  	if (spe->timeless_decoding)
>  		attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
>  	else
> -- 
> 2.34.0.rc2.393.gf8c9666880-goog
> 

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: [PATCH] perf tools: Add SPE total latency as PERF_SAMPLE_WEIGHT
  2021-12-01  5:08 ` Leo Yan
@ 2021-12-01 17:11   ` Namhyung Kim
  0 siblings, 0 replies; 3+ messages in thread
From: Namhyung Kim @ 2021-12-01 17:11 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Ingo Molnar, Peter Zijlstra,
	LKML, Andi Kleen, Ian Rogers, Stephane Eranian, German Gomez,
	Mark Rutland, James Clark

Hi Leo,

On Tue, Nov 30, 2021 at 9:08 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Namhyung,
>
> On Tue, Nov 30, 2021 at 04:39:08PM -0800, Namhyung Kim wrote:
> > Use total latency info in the SPE counter packet as sample weight so
> > that we can see it in local_weight and (global) weight sort keys.
> >
> > Maybe we can use PERF_SAMPLE_WEIGHT_STRUCT to support ins_lat as well
> > but I'm not sure which latency it matches.  So just adding total
> > latency first.
> >
> > Cc: German Gomez <german.gomez@arm.com>
> > Cc: Leo Yan <leo.yan@linaro.org>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  tools/perf/util/arm-spe-decoder/arm-spe-decoder.c | 2 ++
> >  tools/perf/util/arm-spe-decoder/arm-spe-decoder.h | 1 +
> >  tools/perf/util/arm-spe.c                         | 4 +++-
> >  3 files changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> > index 3fc528c9270c..5e390a1a79ab 100644
> > --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> > +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.c
> > @@ -179,6 +179,8 @@ static int arm_spe_read_record(struct arm_spe_decoder *decoder)
> >                               decoder->record.phys_addr = ip;
> >                       break;
> >               case ARM_SPE_COUNTER:
> > +                     if (idx == SPE_CNT_PKT_HDR_INDEX_TOTAL_LAT)
> > +                             decoder->record.latency = payload;
> >                       break;
> >               case ARM_SPE_CONTEXT:
> >                       decoder->record.context_id = payload;
> > diff --git a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> > index 46a8556a9e95..69b31084d6be 100644
> > --- a/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> > +++ b/tools/perf/util/arm-spe-decoder/arm-spe-decoder.h
> > @@ -33,6 +33,7 @@ struct arm_spe_record {
> >       enum arm_spe_sample_type type;
> >       int err;
> >       u32 op;
> > +     u32 latency;
> >       u64 from_ip;
> >       u64 to_ip;
> >       u64 timestamp;
> > diff --git a/tools/perf/util/arm-spe.c b/tools/perf/util/arm-spe.c
> > index 4748bcfe61de..a756325c72a7 100644
> > --- a/tools/perf/util/arm-spe.c
> > +++ b/tools/perf/util/arm-spe.c
> > @@ -317,6 +317,7 @@ static int arm_spe__synth_mem_sample(struct arm_spe_queue *speq,
> >       sample.addr = record->virt_addr;
> >       sample.phys_addr = record->phys_addr;
> >       sample.data_src = data_src;
> > +     sample.weight = record->latency;
>
> The latency can be used for branch operations as well, it's good to
> assign latency for branch samples in the function
> arm_spe__synth_branch_sample().

Yep, I'll update.

>
> With adding latency for branch sample, the change would be good for me:
>
> Reviewed-by: Leo Yan <leo.yan@linaro.org>

Thanks for your review!
Namhyung


>
> >       return arm_spe_deliver_synth_event(spe, speq, event, &sample);
> >  }
> > @@ -980,7 +981,8 @@ arm_spe_synth_events(struct arm_spe *spe, struct perf_session *session)
> >       attr.type = PERF_TYPE_HARDWARE;
> >       attr.sample_type = evsel->core.attr.sample_type & PERF_SAMPLE_MASK;
> >       attr.sample_type |= PERF_SAMPLE_IP | PERF_SAMPLE_TID |
> > -                         PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC;
> > +                         PERF_SAMPLE_PERIOD | PERF_SAMPLE_DATA_SRC |
> > +                         PERF_SAMPLE_WEIGHT;
> >       if (spe->timeless_decoding)
> >               attr.sample_type &= ~(u64)PERF_SAMPLE_TIME;
> >       else
> > --
> > 2.34.0.rc2.393.gf8c9666880-goog
> >

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2021-12-01 17:12 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-01  0:39 [PATCH] perf tools: Add SPE total latency as PERF_SAMPLE_WEIGHT Namhyung Kim
2021-12-01  5:08 ` Leo Yan
2021-12-01 17:11   ` Namhyung Kim

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.