All of lore.kernel.org
 help / color / mirror / Atom feed
From: Leo Yan <leo.yan@linaro.org>
To: James Clark <james.clark@arm.com>
Cc: coresight@lists.linaro.org, al.grant@arm.com,
	branislav.rankov@arm.com, denik@chromium.org,
	suzuki.poulose@arm.com, Mike Leach <mike.leach@linaro.org>,
	Mark Rutland <mark.rutland@arm.com>,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Jiri Olsa <jolsa@redhat.com>, Namhyung Kim <namhyung@kernel.org>,
	John Garry <john.garry@huawei.com>, Will Deacon <will@kernel.org>,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	linux-arm-kernel@lists.infradead.org,
	linux-kernel@vger.kernel.org
Subject: Re: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue
Date: Sat, 27 Feb 2021 15:10:56 +0800	[thread overview]
Message-ID: <20210227071056.GA3317@leoy-ThinkPad-X240s> (raw)
In-Reply-To: <20210212144513.31765-4-james.clark@arm.com>

On Fri, Feb 12, 2021 at 04:45:09PM +0200, James Clark wrote:
> The aux records will be used set the bounds of decoding in a
> later commit. In the future we may also want to use the flags
> of each record to control decoding.
> 
> Do these need to be saved in their entirety, or can pointers
> to each record safely be saved instead for later access?

Rather than introudcing the perf record list, I just wander if we can
use easier method to fix this problem.  So below is the rough idea
(though I don't really verify it):

The essential information we need is what's the valid buffer length can
be used for decoding.  Though cs_etm_queue::buf_len tracks the buffer
length, but it's the buffer length is for the whole AUX buffer, and
which belongs to multiple "PERF_RECORD_AUX" events.  So we cannot decode
at once for the whole trace data in the AUX trace buffer, on the other
hand, the incoming "PERF_RECORD_AUX" event can guide the CoreSight
decoder it should decode how much buffer size.  At the end, the trace
data can be decoded step by step based on the incoming "PERF_RECORD_AUX"
events.

I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it
stands for the record length based on the RECORD_AUX event.  In
theory, this value should be always less than "cs_etm_queue::buf_len".

When every time the "PERF_RECORD_AUX" event is coming, we find out the
corresponding queue (so this can be applied for "1:1" or "N:1" models
for source and sink), and accumulate "perf_record_aux::aux_size" into
"cs_etm_queue::buf_rec_len".

At the decoder side, it decreases "etmq->buf_rec_len" until to zero for
the current round of decoding (see cs_etm__decode_data_block()).  Since
all the "PERF_RECORD_AUX" event will be processed before
"PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be
ignored.

The main reason for this suggestion is it don't need to change the
significant logic in current code.  I will try to do experiment for this
idea and share back.

James, if you think I miss anything, please correct me as needed.
Thanks!

Leo

> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 8f8b448632fb..88b541b2a804 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -92,12 +92,16 @@ struct cs_etm_queue {
>  	/* Conversion between traceID and index in traceid_queues array */
>  	struct intlist *traceid_queues_list;
>  	struct cs_etm_traceid_queue **traceid_queues;
> +	int aux_record_list_len;
> +	int aux_record_list_idx;
> +	struct perf_record_aux *aux_record_list;
>  };
>  
>  /* RB tree for quick conversion between traceID and metadata pointers */
>  static struct intlist *traceid_list;
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu,
> +				 struct perf_record_aux *aux_record);
>  static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
>  static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
>  					   pid_t tid);
> @@ -585,6 +589,7 @@ static void cs_etm__free_queue(void *priv)
>  
>  	cs_etm_decoder__free(etmq->decoder);
>  	cs_etm__free_traceid_queues(etmq);
> +	free(etmq->aux_record_list);
>  	free(etmq);
>  }
>  
> @@ -759,6 +764,19 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm)
>  	return NULL;
>  }
>  
> +static int cs_etm__save_aux_record(struct cs_etm_queue *etmq,
> +				   struct perf_record_aux *aux_record)
> +{
> +	etmq->aux_record_list = reallocarray(etmq->aux_record_list,
> +					      etmq->aux_record_list_len+1,
> +					      sizeof(*etmq->aux_record_list));
> +	if (!etmq->aux_record_list)
> +		return -ENOMEM;
> +
> +	etmq->aux_record_list[etmq->aux_record_list_len++] = *aux_record;
> +	return 0;
> +}
> +
>  static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
>  {
>  	int ret = 0;
> @@ -865,7 +883,7 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
>  	return 0;
>  }
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct perf_record_aux *aux)
>  {
>  	int ret;
>  	if (etm->queues.new_data) {
> @@ -875,6 +893,14 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
>  			return ret;
>  	}
>  
> +	/* In timeless mode, cpu is set to -1, and a single aux buffer is filled */
> +	if (cpu < 0)
> +		cpu = 0;
> +
> +	ret = cs_etm__save_aux_record(etm->queues.queue_array[cpu].priv, aux);
> +	if (ret)
> +		return ret;
> +
>  	if (!etm->timeless_decoding)
>  		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
>  	else
> @@ -2357,7 +2383,7 @@ static int cs_etm__process_event(struct perf_session *session,
>  
>  	if ((timestamp || etm->timeless_decoding)
>  			&& event->header.type == PERF_RECORD_AUX) {
> -		err = cs_etm__update_queues(etm, sample->cpu);
> +		err = cs_etm__update_queues(etm, sample->cpu, &event->aux);
>  		if (err)
>  			return err;
>  	}
> -- 
> 2.28.0
> 

WARNING: multiple messages have this Message-ID (diff)
From: Leo Yan <leo.yan@linaro.org>
To: James Clark <james.clark@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	branislav.rankov@arm.com, al.grant@arm.com, denik@chromium.org,
	Mathieu Poirier <mathieu.poirier@linaro.org>,
	suzuki.poulose@arm.com,
	Alexander Shishkin <alexander.shishkin@linux.intel.com>,
	Will Deacon <will@kernel.org>,
	coresight@lists.linaro.org, John Garry <john.garry@huawei.com>,
	linux-kernel@vger.kernel.org, Namhyung Kim <namhyung@kernel.org>,
	Jiri Olsa <jolsa@redhat.com>,
	linux-arm-kernel@lists.infradead.org,
	Mike Leach <mike.leach@linaro.org>
Subject: Re: [PATCH 3/7] perf cs-etm: Save aux records in each etm queue
Date: Sat, 27 Feb 2021 15:10:56 +0800	[thread overview]
Message-ID: <20210227071056.GA3317@leoy-ThinkPad-X240s> (raw)
In-Reply-To: <20210212144513.31765-4-james.clark@arm.com>

On Fri, Feb 12, 2021 at 04:45:09PM +0200, James Clark wrote:
> The aux records will be used set the bounds of decoding in a
> later commit. In the future we may also want to use the flags
> of each record to control decoding.
> 
> Do these need to be saved in their entirety, or can pointers
> to each record safely be saved instead for later access?

Rather than introudcing the perf record list, I just wander if we can
use easier method to fix this problem.  So below is the rough idea
(though I don't really verify it):

The essential information we need is what's the valid buffer length can
be used for decoding.  Though cs_etm_queue::buf_len tracks the buffer
length, but it's the buffer length is for the whole AUX buffer, and
which belongs to multiple "PERF_RECORD_AUX" events.  So we cannot decode
at once for the whole trace data in the AUX trace buffer, on the other
hand, the incoming "PERF_RECORD_AUX" event can guide the CoreSight
decoder it should decode how much buffer size.  At the end, the trace
data can be decoded step by step based on the incoming "PERF_RECORD_AUX"
events.

I'd like to propose to add a new field "cs_etm_queue::buf_rec_len", it
stands for the record length based on the RECORD_AUX event.  In
theory, this value should be always less than "cs_etm_queue::buf_len".

When every time the "PERF_RECORD_AUX" event is coming, we find out the
corresponding queue (so this can be applied for "1:1" or "N:1" models
for source and sink), and accumulate "perf_record_aux::aux_size" into
"cs_etm_queue::buf_rec_len".

At the decoder side, it decreases "etmq->buf_rec_len" until to zero for
the current round of decoding (see cs_etm__decode_data_block()).  Since
all the "PERF_RECORD_AUX" event will be processed before
"PERF_RECORD_EXIT" event, so we don't worry the tail trace data will be
ignored.

The main reason for this suggestion is it don't need to change the
significant logic in current code.  I will try to do experiment for this
idea and share back.

James, if you think I miss anything, please correct me as needed.
Thanks!

Leo

> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 32 +++++++++++++++++++++++++++++---
>  1 file changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 8f8b448632fb..88b541b2a804 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -92,12 +92,16 @@ struct cs_etm_queue {
>  	/* Conversion between traceID and index in traceid_queues array */
>  	struct intlist *traceid_queues_list;
>  	struct cs_etm_traceid_queue **traceid_queues;
> +	int aux_record_list_len;
> +	int aux_record_list_idx;
> +	struct perf_record_aux *aux_record_list;
>  };
>  
>  /* RB tree for quick conversion between traceID and metadata pointers */
>  static struct intlist *traceid_list;
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu);
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu,
> +				 struct perf_record_aux *aux_record);
>  static int cs_etm__process_queues(struct cs_etm_auxtrace *etm);
>  static int cs_etm__process_timeless_queues(struct cs_etm_auxtrace *etm,
>  					   pid_t tid);
> @@ -585,6 +589,7 @@ static void cs_etm__free_queue(void *priv)
>  
>  	cs_etm_decoder__free(etmq->decoder);
>  	cs_etm__free_traceid_queues(etmq);
> +	free(etmq->aux_record_list);
>  	free(etmq);
>  }
>  
> @@ -759,6 +764,19 @@ static struct cs_etm_queue *cs_etm__alloc_queue(struct cs_etm_auxtrace *etm)
>  	return NULL;
>  }
>  
> +static int cs_etm__save_aux_record(struct cs_etm_queue *etmq,
> +				   struct perf_record_aux *aux_record)
> +{
> +	etmq->aux_record_list = reallocarray(etmq->aux_record_list,
> +					      etmq->aux_record_list_len+1,
> +					      sizeof(*etmq->aux_record_list));
> +	if (!etmq->aux_record_list)
> +		return -ENOMEM;
> +
> +	etmq->aux_record_list[etmq->aux_record_list_len++] = *aux_record;
> +	return 0;
> +}
> +
>  static int cs_etm__search_first_timestamp(struct cs_etm_queue *etmq)
>  {
>  	int ret = 0;
> @@ -865,7 +883,7 @@ static int cs_etm__setup_queues(struct cs_etm_auxtrace *etm)
>  	return 0;
>  }
>  
> -static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
> +static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu, struct perf_record_aux *aux)
>  {
>  	int ret;
>  	if (etm->queues.new_data) {
> @@ -875,6 +893,14 @@ static int cs_etm__update_queues(struct cs_etm_auxtrace *etm, int cpu)
>  			return ret;
>  	}
>  
> +	/* In timeless mode, cpu is set to -1, and a single aux buffer is filled */
> +	if (cpu < 0)
> +		cpu = 0;
> +
> +	ret = cs_etm__save_aux_record(etm->queues.queue_array[cpu].priv, aux);
> +	if (ret)
> +		return ret;
> +
>  	if (!etm->timeless_decoding)
>  		return cs_etm__search_first_timestamp(etm->queues.queue_array[cpu].priv);
>  	else
> @@ -2357,7 +2383,7 @@ static int cs_etm__process_event(struct perf_session *session,
>  
>  	if ((timestamp || etm->timeless_decoding)
>  			&& event->header.type == PERF_RECORD_AUX) {
> -		err = cs_etm__update_queues(etm, sample->cpu);
> +		err = cs_etm__update_queues(etm, sample->cpu, &event->aux);
>  		if (err)
>  			return err;
>  	}
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

  reply	other threads:[~2021-02-27  7:21 UTC|newest]

Thread overview: 39+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-02-12 14:45 [PATCH 0/7] Split Coresight decode by aux records James Clark
2021-02-12 14:45 ` James Clark
2021-02-12 14:45 ` [PATCH 1/7] perf cs-etm: Split up etm queue setup function James Clark
2021-02-12 14:45   ` James Clark
2021-02-20  8:11   ` Leo Yan
2021-02-20  8:11     ` Leo Yan
2021-02-12 14:45 ` [PATCH 2/7] perf cs-etm: Only search timestamp in current sample's queue James Clark
2021-02-12 14:45   ` James Clark
2021-02-20 11:50   ` Leo Yan
2021-02-20 11:50     ` Leo Yan
2021-03-01 15:28     ` James Clark
2021-03-01 15:28       ` James Clark
2021-03-02 11:52       ` Leo Yan
2021-03-02 11:52         ` Leo Yan
2021-05-06 10:45   ` James Clark
2021-05-06 10:45     ` James Clark
2021-02-12 14:45 ` [PATCH 3/7] perf cs-etm: Save aux records in each etm queue James Clark
2021-02-12 14:45   ` James Clark
2021-02-27  7:10   ` Leo Yan [this message]
2021-02-27  7:10     ` Leo Yan
2021-03-01 15:43     ` James Clark
2021-03-01 15:43       ` James Clark
2021-03-02 12:03       ` Leo Yan
2021-03-02 12:03         ` Leo Yan
2021-03-10 15:41         ` James Clark
2021-02-12 14:45 ` [PATCH 4/7] perf cs-etm: don't process queues until cs_etm__flush_events James Clark
2021-02-12 14:45   ` James Clark
2021-02-12 14:45 ` [PATCH 5/7] perf cs-etm: split decode by aux records James Clark
2021-02-12 14:45   ` James Clark
2021-02-12 14:45 ` [PATCH 6/7] perf cs-etm: Use existing decode code path for --dump-raw-trace James Clark
2021-02-12 14:45   ` James Clark
2021-02-12 14:45 ` [PATCH 7/7] perf cs-etm: Suppress printing when resetting decoder James Clark
2021-02-12 14:45   ` James Clark
2021-02-24 16:13 ` [PATCH 0/7] Split Coresight decode by aux records Mathieu Poirier
2021-02-24 16:13   ` Mathieu Poirier
2021-03-01 14:05   ` James Clark
2021-03-01 14:05     ` James Clark
2021-04-15 20:37 ` Mathieu Poirier
2021-04-15 20:37   ` Mathieu Poirier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20210227071056.GA3317@leoy-ThinkPad-X240s \
    --to=leo.yan@linaro.org \
    --cc=al.grant@arm.com \
    --cc=alexander.shishkin@linux.intel.com \
    --cc=branislav.rankov@arm.com \
    --cc=coresight@lists.linaro.org \
    --cc=denik@chromium.org \
    --cc=james.clark@arm.com \
    --cc=john.garry@huawei.com \
    --cc=jolsa@redhat.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=mathieu.poirier@linaro.org \
    --cc=mike.leach@linaro.org \
    --cc=namhyung@kernel.org \
    --cc=suzuki.poulose@arm.com \
    --cc=will@kernel.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.