linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [RFC PATCH] perf session: Fixup timestamp for ordered events
@ 2020-11-18 10:55 Leo Yan
  2020-11-18 11:15 ` Adrian Hunter
  0 siblings, 1 reply; 5+ messages in thread
From: Leo Yan @ 2020-11-18 10:55 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland, Adrian Hunter,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach,
	Alexander Shishkin, Alexey Budankov, Kan Liang, linux-kernel,
	Coresight ML
  Cc: Leo Yan

Perf tool relies on MMAP/MMAP2 events to prepare DSO maps, it allocates
DSO maps for MMAP/MMAP2 events which is used for parsing symbol.  Thus,
during the recording, perf tool implictly expects the MMAP/MMAP2 events
should arrive before AUX event, in other words, MMAP/MMAP2's timestamp
should less than AUX event's timestamp, and the MMAP/MMAP2 events will
be added into the front of ordered event list; this can allow the DSO
maps to be prepared and can be consumed when process AUX event.

See the function perf_evlist__prepare_workload(), though it uses pipes
to handshake before perf process (the parent process) and forked process
for the profiled program, it cannot promise the timing between
MMAP/MMAP2 events and AUX event.

On Arm Juno board, the AUX event can be observed to arrive ahead than
MMAP2 event, with the command:

  perf record -e cs_etm/@tmc_etr0/ -S -- dd if=/dev/zero of=/dev/null

The follow diagram depicts the flow for how the AUX event is arriving
ahead than MMAP2 event:

       T1:          T3:                  T4:
  perf process   Open PMU device    Perf is scheduled out;
                                    invoke perf_aux_output_end()
	                            and generate AUX event
       ^            ^                    ^
       |            |                    |
 CPU0 ---------------------------------------------------> (T)
       \
        \  Forked process is placed on another CPU
         V
 CPU1 ---------------------------------------------------> (T)
              |                                |
	      V                                V
	      T2:                              T5:
        Invoke execvp() for profiled       Record MMAP2 event
      	program


In this scenario, the perf main process runs on CPU0 and the profiled
program (which is forked child process) runs on CPU1.  The main process
opens PMU device for AUX trace (T3) and it will generate AUX event until
the perf process is scheduled out (T4); the profiled program will be
launched by execvp() (T2) and later will record MMAP event for memory
mapping (T5).

Usually, the AUX event will be later than MMAP2 event, but on the Arm
Juno platform, it has chance that AUX event occurs prior to MMAP2 event
with two reasons:

- Arm Juno platform is big.LITTLE architecture, so CPU0 is big CPU and
  CPU1 is LITTLE CPU, the performance between big CPU and LITTLE CPU is
  significant, this gives chance for the perf main process to run much
  faster than the profiled program;

- In the kernel, the RT thread (like kernel's CPUFreq thread) has chance
  to preempt perf main thread, so when the perf main thread is
  switched out, the AUX event will be generated and it might be early
  than profiled program's MMAP2 event.

To fix this issue, this patch records the first AUX event's timestamp
into 'aux_timestamp', if find any MMAP/MMAP2 event is late coming, it
fixes up the MMAP/MMAP2 events' timestamp as 'aux_timestamp-1', so the
MMAP/MMAP2 event will be inserted into ordered list ahead than AUX event
and also will be handled before AUX event.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/util/session.c | 25 +++++++++++++++++++++++++
 1 file changed, 25 insertions(+)

diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 098080287c68..1aa54941bf81 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1753,11 +1753,36 @@ static s64 perf_session__process_event(struct perf_session *session,
 
 	if (tool->ordered_events) {
 		u64 timestamp = -1ULL;
+		static u64 aux_timestamp = -1ULL;
 
 		ret = perf_evlist__parse_sample_timestamp(evlist, event, &timestamp);
 		if (ret && ret != -1)
 			return ret;
 
+		/*
+		 * Cache the first AUX event's timestamp into 'aux_timestamp',
+		 * which is used to fixup MMAP/MMAP2's timestamp.
+		 */
+		if ((event->header.type == PERF_RECORD_AUX) &&
+		    (aux_timestamp == -1ULL))
+			aux_timestamp = timestamp;
+
+		/*
+		 * If the AUX event arrives prior to MMAP/MMAP2 events, it's
+		 * possible to have no chance to create DSOs when decode AUX
+		 * trace data, thus the symbol cannot be parsed properly.
+		 * To allow the DSOs are prepared before process AUX event,
+		 * fixup the MMAP/MMAP2 events' timestamp to be prior to any
+		 * AUX event's timestamp, so MMAP/MMAP2 events will be
+		 * handled ahead and the DSO map will be prepared before AUX
+		 * event handling.
+		 */
+		if (event->header.type == PERF_RECORD_MMAP2 ||
+		    event->header.type == PERF_RECORD_MMAP) {
+			if (timestamp > aux_timestamp)
+				timestamp = aux_timestamp - 1;
+		}
+
 		ret = perf_session__queue_event(session, event, timestamp, file_offset);
 		if (ret != -ETIME)
 			return ret;
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] perf session: Fixup timestamp for ordered events
  2020-11-18 10:55 [RFC PATCH] perf session: Fixup timestamp for ordered events Leo Yan
@ 2020-11-18 11:15 ` Adrian Hunter
  2020-11-18 11:43   ` Leo Yan
  0 siblings, 1 reply; 5+ messages in thread
From: Adrian Hunter @ 2020-11-18 11:15 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach,
	Alexander Shishkin, Alexey Budankov, Kan Liang, linux-kernel,
	Coresight ML

On 18/11/20 12:55 pm, Leo Yan wrote:
> Perf tool relies on MMAP/MMAP2 events to prepare DSO maps, it allocates
> DSO maps for MMAP/MMAP2 events which is used for parsing symbol.  Thus,
> during the recording, perf tool implictly expects the MMAP/MMAP2 events
> should arrive before AUX event,

No it doesn't.  There is an index and queues to queue PERF_RECORD_AUXTRACE
events in advance so that they can be processed in time order.

 in other words, MMAP/MMAP2's timestamp
> should less than AUX event's timestamp, and the MMAP/MMAP2 events will
> be added into the front of ordered event list; this can allow the DSO
> maps to be prepared and can be consumed when process AUX event.
> 
> See the function perf_evlist__prepare_workload(), though it uses pipes
> to handshake before perf process (the parent process) and forked process
> for the profiled program, it cannot promise the timing between
> MMAP/MMAP2 events and AUX event.
> 
> On Arm Juno board, the AUX event can be observed to arrive ahead than
> MMAP2 event, with the command:
> 
>   perf record -e cs_etm/@tmc_etr0/ -S -- dd if=/dev/zero of=/dev/null
> 
> The follow diagram depicts the flow for how the AUX event is arriving
> ahead than MMAP2 event:
> 
>        T1:          T3:                  T4:
>   perf process   Open PMU device    Perf is scheduled out;
>                                     invoke perf_aux_output_end()
> 	                            and generate AUX event
>        ^            ^                    ^
>        |            |                    |
>  CPU0 ---------------------------------------------------> (T)
>        \
>         \  Forked process is placed on another CPU
>          V
>  CPU1 ---------------------------------------------------> (T)
>               |                                |
> 	      V                                V
> 	      T2:                              T5:
>         Invoke execvp() for profiled       Record MMAP2 event
>       	program

I don't understand.  An MMAP at T5 cannot affect data recorded before T5
(e.g. T4).
Are you saying the timestamps from different CPUs are not in time order?  Ot
are you saying the data is actually recorded after T5?

> 
> 
> In this scenario, the perf main process runs on CPU0 and the profiled
> program (which is forked child process) runs on CPU1.  The main process
> opens PMU device for AUX trace (T3) and it will generate AUX event until
> the perf process is scheduled out (T4); the profiled program will be
> launched by execvp() (T2) and later will record MMAP event for memory
> mapping (T5).
> 
> Usually, the AUX event will be later than MMAP2 event, but on the Arm
> Juno platform, it has chance that AUX event occurs prior to MMAP2 event
> with two reasons:
> 
> - Arm Juno platform is big.LITTLE architecture, so CPU0 is big CPU and
>   CPU1 is LITTLE CPU, the performance between big CPU and LITTLE CPU is
>   significant, this gives chance for the perf main process to run much
>   faster than the profiled program;
> 
> - In the kernel, the RT thread (like kernel's CPUFreq thread) has chance
>   to preempt perf main thread, so when the perf main thread is
>   switched out, the AUX event will be generated and it might be early
>   than profiled program's MMAP2 event.
> 
> To fix this issue, this patch records the first AUX event's timestamp
> into 'aux_timestamp', if find any MMAP/MMAP2 event is late coming, it
> fixes up the MMAP/MMAP2 events' timestamp as 'aux_timestamp-1', so the
> MMAP/MMAP2 event will be inserted into ordered list ahead than AUX event
> and also will be handled before AUX event.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  tools/perf/util/session.c | 25 +++++++++++++++++++++++++
>  1 file changed, 25 insertions(+)
> 
> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> index 098080287c68..1aa54941bf81 100644
> --- a/tools/perf/util/session.c
> +++ b/tools/perf/util/session.c
> @@ -1753,11 +1753,36 @@ static s64 perf_session__process_event(struct perf_session *session,
>  
>  	if (tool->ordered_events) {
>  		u64 timestamp = -1ULL;
> +		static u64 aux_timestamp = -1ULL;
>  
>  		ret = perf_evlist__parse_sample_timestamp(evlist, event, &timestamp);
>  		if (ret && ret != -1)
>  			return ret;
>  
> +		/*
> +		 * Cache the first AUX event's timestamp into 'aux_timestamp',
> +		 * which is used to fixup MMAP/MMAP2's timestamp.
> +		 */
> +		if ((event->header.type == PERF_RECORD_AUX) &&
> +		    (aux_timestamp == -1ULL))
> +			aux_timestamp = timestamp;
> +
> +		/*
> +		 * If the AUX event arrives prior to MMAP/MMAP2 events, it's
> +		 * possible to have no chance to create DSOs when decode AUX
> +		 * trace data, thus the symbol cannot be parsed properly.
> +		 * To allow the DSOs are prepared before process AUX event,
> +		 * fixup the MMAP/MMAP2 events' timestamp to be prior to any
> +		 * AUX event's timestamp, so MMAP/MMAP2 events will be
> +		 * handled ahead and the DSO map will be prepared before AUX
> +		 * event handling.
> +		 */
> +		if (event->header.type == PERF_RECORD_MMAP2 ||
> +		    event->header.type == PERF_RECORD_MMAP) {
> +			if (timestamp > aux_timestamp)
> +				timestamp = aux_timestamp - 1;
> +		}
> +
>  		ret = perf_session__queue_event(session, event, timestamp, file_offset);
>  		if (ret != -ETIME)
>  			return ret;
> 


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] perf session: Fixup timestamp for ordered events
  2020-11-18 11:15 ` Adrian Hunter
@ 2020-11-18 11:43   ` Leo Yan
  2020-11-18 15:57     ` Adrian Hunter
  0 siblings, 1 reply; 5+ messages in thread
From: Leo Yan @ 2020-11-18 11:43 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach,
	Alexander Shishkin, Alexey Budankov, Kan Liang, linux-kernel,
	Coresight ML

Hi Adrian,

On Wed, Nov 18, 2020 at 01:15:32PM +0200, Adrian Hunter wrote:
> On 18/11/20 12:55 pm, Leo Yan wrote:
> > Perf tool relies on MMAP/MMAP2 events to prepare DSO maps, it allocates
> > DSO maps for MMAP/MMAP2 events which is used for parsing symbol.  Thus,
> > during the recording, perf tool implictly expects the MMAP/MMAP2 events
> > should arrive before AUX event,
> 
> No it doesn't.  There is an index and queues to queue PERF_RECORD_AUXTRACE
> events in advance so that they can be processed in time order.
> 
>  in other words, MMAP/MMAP2's timestamp
> > should less than AUX event's timestamp, and the MMAP/MMAP2 events will
> > be added into the front of ordered event list; this can allow the DSO
> > maps to be prepared and can be consumed when process AUX event.
> > 
> > See the function perf_evlist__prepare_workload(), though it uses pipes
> > to handshake before perf process (the parent process) and forked process
> > for the profiled program, it cannot promise the timing between
> > MMAP/MMAP2 events and AUX event.
> > 
> > On Arm Juno board, the AUX event can be observed to arrive ahead than
> > MMAP2 event, with the command:
> > 
> >   perf record -e cs_etm/@tmc_etr0/ -S -- dd if=/dev/zero of=/dev/null
> > 
> > The follow diagram depicts the flow for how the AUX event is arriving
> > ahead than MMAP2 event:
> > 
> >        T1:          T3:                  T4:
> >   perf process   Open PMU device    Perf is scheduled out;
> >                                     invoke perf_aux_output_end()
> > 	                            and generate AUX event
> >        ^            ^                    ^
> >        |            |                    |
> >  CPU0 ---------------------------------------------------> (T)
> >        \
> >         \  Forked process is placed on another CPU
> >          V
> >  CPU1 ---------------------------------------------------> (T)
> >               |                                |
> >               V                                V
> >               T2:                              T5:
> >         Invoke execvp() for profiled       Record MMAP2 event
> >       	program
> 
> I don't understand.  An MMAP at T5 cannot affect data recorded before T5
> (e.g. T4).
> Are you saying the timestamps from different CPUs are not in time order?  Ot
> are you saying the data is actually recorded after T5?

Here refers to the later one.

I observed the PMU device is opened and closed for multiple times, so
the AUX events and MMAP2 events are interleave with each other.  Below
is the dumping with debugging option "--debug ordered-events=1" (and I
added code for printing out event type value).

When perf tool starts to decode the trace data for the first AUX event
(its timestamp is 160.767623) and try to synthesize samples, it doesn't
give chance for handling the sequential MMAP2 events (ts: 160.770223,
160.770451, 160.773111).

[          160.765623] queue_event nr_events 1
[          160.765623] new event on queue: type=12
[          160.765714] queue_event nr_events 2
[          160.765714] new event on queue: type=3
[          160.765822] queue_event nr_events 3
[          160.765822] new event on queue: type=10  ==> MMAP2 event
[          160.766180] queue_event nr_events 4
[          160.766180] new event on queue: type=14
[          160.767623] queue_event nr_events 5
[          160.767623] new event on queue: type=11  ==> AUX event
[          160.769710] queue_event nr_events 6
[          160.769710] new event on queue: type=12
[          160.769762] queue_event nr_events 7
[          160.769762] new event on queue: type=14
[          160.769810] queue_event nr_events 8
[          160.769810] new event on queue: type=14
[          160.769943] queue_event nr_events 9
[          160.769943] new event on queue: type=11  ==> AUX event
[          160.770028] queue_event nr_events 10
[          160.770028] new event on queue: type=12
[          160.770073] queue_event nr_events 11
[          160.770073] new event on queue: type=14
[          160.770223] queue_event nr_events 12
[          160.770223] new event on queue: type=10  ==> MMAP2 event
[          160.770451] queue_event nr_events 13
[          160.770451] new event on queue: type=10  ==> MMAP2 event
[          160.770628] queue_event nr_events 14
[          160.770628] new event on queue: type=14
[          160.771650] queue_event nr_events 15
[          160.771650] new event on queue: type=11  ==> AUX event
[          160.771752] queue_event nr_events 16
[          160.771752] new event on queue: type=12
[          160.771798] queue_event nr_events 17
[          160.771798] new event on queue: type=14
[          160.773111] queue_event nr_events 18
[          160.773111] new event on queue: type=10  ==> MMAP2 event
[          160.782613] queue_event nr_events 19
[          160.782613] new event on queue: type=14

Thanks for the reviewing,

Leo

> > In this scenario, the perf main process runs on CPU0 and the profiled
> > program (which is forked child process) runs on CPU1.  The main process
> > opens PMU device for AUX trace (T3) and it will generate AUX event until
> > the perf process is scheduled out (T4); the profiled program will be
> > launched by execvp() (T2) and later will record MMAP event for memory
> > mapping (T5).
> > 
> > Usually, the AUX event will be later than MMAP2 event, but on the Arm
> > Juno platform, it has chance that AUX event occurs prior to MMAP2 event
> > with two reasons:
> > 
> > - Arm Juno platform is big.LITTLE architecture, so CPU0 is big CPU and
> >   CPU1 is LITTLE CPU, the performance between big CPU and LITTLE CPU is
> >   significant, this gives chance for the perf main process to run much
> >   faster than the profiled program;
> > 
> > - In the kernel, the RT thread (like kernel's CPUFreq thread) has chance
> >   to preempt perf main thread, so when the perf main thread is
> >   switched out, the AUX event will be generated and it might be early
> >   than profiled program's MMAP2 event.
> > 
> > To fix this issue, this patch records the first AUX event's timestamp
> > into 'aux_timestamp', if find any MMAP/MMAP2 event is late coming, it
> > fixes up the MMAP/MMAP2 events' timestamp as 'aux_timestamp-1', so the
> > MMAP/MMAP2 event will be inserted into ordered list ahead than AUX event
> > and also will be handled before AUX event.
> > 
> > Signed-off-by: Leo Yan <leo.yan@linaro.org>
> > ---
> >  tools/perf/util/session.c | 25 +++++++++++++++++++++++++
> >  1 file changed, 25 insertions(+)
> > 
> > diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
> > index 098080287c68..1aa54941bf81 100644
> > --- a/tools/perf/util/session.c
> > +++ b/tools/perf/util/session.c
> > @@ -1753,11 +1753,36 @@ static s64 perf_session__process_event(struct perf_session *session,
> >  
> >  	if (tool->ordered_events) {
> >  		u64 timestamp = -1ULL;
> > +		static u64 aux_timestamp = -1ULL;
> >  
> >  		ret = perf_evlist__parse_sample_timestamp(evlist, event, &timestamp);
> >  		if (ret && ret != -1)
> >  			return ret;
> >  
> > +		/*
> > +		 * Cache the first AUX event's timestamp into 'aux_timestamp',
> > +		 * which is used to fixup MMAP/MMAP2's timestamp.
> > +		 */
> > +		if ((event->header.type == PERF_RECORD_AUX) &&
> > +		    (aux_timestamp == -1ULL))
> > +			aux_timestamp = timestamp;
> > +
> > +		/*
> > +		 * If the AUX event arrives prior to MMAP/MMAP2 events, it's
> > +		 * possible to have no chance to create DSOs when decode AUX
> > +		 * trace data, thus the symbol cannot be parsed properly.
> > +		 * To allow the DSOs are prepared before process AUX event,
> > +		 * fixup the MMAP/MMAP2 events' timestamp to be prior to any
> > +		 * AUX event's timestamp, so MMAP/MMAP2 events will be
> > +		 * handled ahead and the DSO map will be prepared before AUX
> > +		 * event handling.
> > +		 */
> > +		if (event->header.type == PERF_RECORD_MMAP2 ||
> > +		    event->header.type == PERF_RECORD_MMAP) {
> > +			if (timestamp > aux_timestamp)
> > +				timestamp = aux_timestamp - 1;
> > +		}
> > +
> >  		ret = perf_session__queue_event(session, event, timestamp, file_offset);
> >  		if (ret != -ETIME)
> >  			return ret;
> > 
> 

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] perf session: Fixup timestamp for ordered events
  2020-11-18 11:43   ` Leo Yan
@ 2020-11-18 15:57     ` Adrian Hunter
  2020-11-19  6:25       ` Leo Yan
  0 siblings, 1 reply; 5+ messages in thread
From: Adrian Hunter @ 2020-11-18 15:57 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach,
	Alexander Shishkin, Alexey Budankov, Kan Liang, linux-kernel,
	Coresight ML

On 18/11/20 1:43 pm, Leo Yan wrote:
> Hi Adrian,
> 
> On Wed, Nov 18, 2020 at 01:15:32PM +0200, Adrian Hunter wrote:
>> On 18/11/20 12:55 pm, Leo Yan wrote:
>>> Perf tool relies on MMAP/MMAP2 events to prepare DSO maps, it allocates
>>> DSO maps for MMAP/MMAP2 events which is used for parsing symbol.  Thus,
>>> during the recording, perf tool implictly expects the MMAP/MMAP2 events
>>> should arrive before AUX event,
>>
>> No it doesn't.  There is an index and queues to queue PERF_RECORD_AUXTRACE
>> events in advance so that they can be processed in time order.
>>
>>  in other words, MMAP/MMAP2's timestamp
>>> should less than AUX event's timestamp, and the MMAP/MMAP2 events will
>>> be added into the front of ordered event list; this can allow the DSO
>>> maps to be prepared and can be consumed when process AUX event.
>>>
>>> See the function perf_evlist__prepare_workload(), though it uses pipes
>>> to handshake before perf process (the parent process) and forked process
>>> for the profiled program, it cannot promise the timing between
>>> MMAP/MMAP2 events and AUX event.
>>>
>>> On Arm Juno board, the AUX event can be observed to arrive ahead than
>>> MMAP2 event, with the command:
>>>
>>>   perf record -e cs_etm/@tmc_etr0/ -S -- dd if=/dev/zero of=/dev/null
>>>
>>> The follow diagram depicts the flow for how the AUX event is arriving
>>> ahead than MMAP2 event:
>>>
>>>        T1:          T3:                  T4:
>>>   perf process   Open PMU device    Perf is scheduled out;
>>>                                     invoke perf_aux_output_end()
>>> 	                            and generate AUX event
>>>        ^            ^                    ^
>>>        |            |                    |
>>>  CPU0 ---------------------------------------------------> (T)
>>>        \
>>>         \  Forked process is placed on another CPU
>>>          V
>>>  CPU1 ---------------------------------------------------> (T)
>>>               |                                |
>>>               V                                V
>>>               T2:                              T5:
>>>         Invoke execvp() for profiled       Record MMAP2 event
>>>       	program
>>
>> I don't understand.  An MMAP at T5 cannot affect data recorded before T5
>> (e.g. T4).
>> Are you saying the timestamps from different CPUs are not in time order?  Ot
>> are you saying the data is actually recorded after T5?
> 
> Here refers to the later one.
> 
> I observed the PMU device is opened and closed for multiple times, so
> the AUX events and MMAP2 events are interleave with each other.  Below
> is the dumping with debugging option "--debug ordered-events=1" (and I
> added code for printing out event type value).
> 
> When perf tool starts to decode the trace data for the first AUX event
> (its timestamp is 160.767623) and try to synthesize samples, it doesn't
> give chance for handling the sequential MMAP2 events (ts: 160.770223,
> 160.770451, 160.773111).
> 
> [          160.765623] queue_event nr_events 1
> [          160.765623] new event on queue: type=12
> [          160.765714] queue_event nr_events 2
> [          160.765714] new event on queue: type=3
> [          160.765822] queue_event nr_events 3
> [          160.765822] new event on queue: type=10  ==> MMAP2 event
> [          160.766180] queue_event nr_events 4
> [          160.766180] new event on queue: type=14
> [          160.767623] queue_event nr_events 5
> [          160.767623] new event on queue: type=11  ==> AUX event
> [          160.769710] queue_event nr_events 6
> [          160.769710] new event on queue: type=12
> [          160.769762] queue_event nr_events 7
> [          160.769762] new event on queue: type=14
> [          160.769810] queue_event nr_events 8
> [          160.769810] new event on queue: type=14
> [          160.769943] queue_event nr_events 9
> [          160.769943] new event on queue: type=11  ==> AUX event
> [          160.770028] queue_event nr_events 10
> [          160.770028] new event on queue: type=12
> [          160.770073] queue_event nr_events 11
> [          160.770073] new event on queue: type=14
> [          160.770223] queue_event nr_events 12
> [          160.770223] new event on queue: type=10  ==> MMAP2 event
> [          160.770451] queue_event nr_events 13
> [          160.770451] new event on queue: type=10  ==> MMAP2 event
> [          160.770628] queue_event nr_events 14
> [          160.770628] new event on queue: type=14
> [          160.771650] queue_event nr_events 15
> [          160.771650] new event on queue: type=11  ==> AUX event
> [          160.771752] queue_event nr_events 16
> [          160.771752] new event on queue: type=12
> [          160.771798] queue_event nr_events 17
> [          160.771798] new event on queue: type=14
> [          160.773111] queue_event nr_events 18
> [          160.773111] new event on queue: type=10  ==> MMAP2 event
> [          160.782613] queue_event nr_events 19
> [          160.782613] new event on queue: type=14
> 
> Thanks for the reviewing,
> 
> Leo
> 

When you say AUX event, do you mean PERF_RECORD_AUX or PERF_RECORD_AUXTRACE?

The kernel creates PERF_RECORD_AUX and the data range aux_offset/aux_size
should be for things prior to the sample time.

perf tool creates PERF_RECORD_AUXTRACE when copying data out of the aux
area. PERF_RECORD_AUXTRACE does not have a timestamp and can contain data
corresponding to several PERF_RECORD_AUX.  Data is queued based on
PERF_RECORD_AUXTRACE because that is what the data is attached to i.e. one
buffer can contain data from several PERF_RECORD_AUX events.

If you are processing data based on PERF_RECORD_AUX timestamp then you need
to pay attention to the offset.  PERF_RECORD_AUX gives you
aux_offset/aux_size and auxtrace_buffer (which may contain data from several
PERF_RECORD_AUX) gives you offset/size.


>>> In this scenario, the perf main process runs on CPU0 and the profiled
>>> program (which is forked child process) runs on CPU1.  The main process
>>> opens PMU device for AUX trace (T3) and it will generate AUX event until
>>> the perf process is scheduled out (T4); the profiled program will be
>>> launched by execvp() (T2) and later will record MMAP event for memory
>>> mapping (T5).
>>>
>>> Usually, the AUX event will be later than MMAP2 event, but on the Arm
>>> Juno platform, it has chance that AUX event occurs prior to MMAP2 event
>>> with two reasons:
>>>
>>> - Arm Juno platform is big.LITTLE architecture, so CPU0 is big CPU and
>>>   CPU1 is LITTLE CPU, the performance between big CPU and LITTLE CPU is
>>>   significant, this gives chance for the perf main process to run much
>>>   faster than the profiled program;
>>>
>>> - In the kernel, the RT thread (like kernel's CPUFreq thread) has chance
>>>   to preempt perf main thread, so when the perf main thread is
>>>   switched out, the AUX event will be generated and it might be early
>>>   than profiled program's MMAP2 event.
>>>
>>> To fix this issue, this patch records the first AUX event's timestamp
>>> into 'aux_timestamp', if find any MMAP/MMAP2 event is late coming, it
>>> fixes up the MMAP/MMAP2 events' timestamp as 'aux_timestamp-1', so the
>>> MMAP/MMAP2 event will be inserted into ordered list ahead than AUX event
>>> and also will be handled before AUX event.
>>>
>>> Signed-off-by: Leo Yan <leo.yan@linaro.org>
>>> ---
>>>  tools/perf/util/session.c | 25 +++++++++++++++++++++++++
>>>  1 file changed, 25 insertions(+)
>>>
>>> diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
>>> index 098080287c68..1aa54941bf81 100644
>>> --- a/tools/perf/util/session.c
>>> +++ b/tools/perf/util/session.c
>>> @@ -1753,11 +1753,36 @@ static s64 perf_session__process_event(struct perf_session *session,
>>>  
>>>  	if (tool->ordered_events) {
>>>  		u64 timestamp = -1ULL;
>>> +		static u64 aux_timestamp = -1ULL;
>>>  
>>>  		ret = perf_evlist__parse_sample_timestamp(evlist, event, &timestamp);
>>>  		if (ret && ret != -1)
>>>  			return ret;
>>>  
>>> +		/*
>>> +		 * Cache the first AUX event's timestamp into 'aux_timestamp',
>>> +		 * which is used to fixup MMAP/MMAP2's timestamp.
>>> +		 */
>>> +		if ((event->header.type == PERF_RECORD_AUX) &&
>>> +		    (aux_timestamp == -1ULL))
>>> +			aux_timestamp = timestamp;
>>> +
>>> +		/*
>>> +		 * If the AUX event arrives prior to MMAP/MMAP2 events, it's
>>> +		 * possible to have no chance to create DSOs when decode AUX
>>> +		 * trace data, thus the symbol cannot be parsed properly.
>>> +		 * To allow the DSOs are prepared before process AUX event,
>>> +		 * fixup the MMAP/MMAP2 events' timestamp to be prior to any
>>> +		 * AUX event's timestamp, so MMAP/MMAP2 events will be
>>> +		 * handled ahead and the DSO map will be prepared before AUX
>>> +		 * event handling.
>>> +		 */
>>> +		if (event->header.type == PERF_RECORD_MMAP2 ||
>>> +		    event->header.type == PERF_RECORD_MMAP) {
>>> +			if (timestamp > aux_timestamp)
>>> +				timestamp = aux_timestamp - 1;
>>> +		}
>>> +
>>>  		ret = perf_session__queue_event(session, event, timestamp, file_offset);
>>>  		if (ret != -ETIME)
>>>  			return ret;
>>>
>>


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [RFC PATCH] perf session: Fixup timestamp for ordered events
  2020-11-18 15:57     ` Adrian Hunter
@ 2020-11-19  6:25       ` Leo Yan
  0 siblings, 0 replies; 5+ messages in thread
From: Leo Yan @ 2020-11-19  6:25 UTC (permalink / raw)
  To: Adrian Hunter
  Cc: Arnaldo Carvalho de Melo, Jiri Olsa, Mark Rutland,
	Mathieu Poirier, Suzuki K Poulose, Mike Leach,
	Alexander Shishkin, Alexey Budankov, Kan Liang, linux-kernel,
	Coresight ML

Hi Adrian,

On Wed, Nov 18, 2020 at 05:57:09PM +0200, Adrian Hunter wrote:

[...]

> >>> The follow diagram depicts the flow for how the AUX event is arriving
> >>> ahead than MMAP2 event:
> >>>
> >>>        T1:          T3:                  T4:
> >>>   perf process   Open PMU device    Perf is scheduled out;
> >>>                                     invoke perf_aux_output_end()
> >>> 	                            and generate AUX event
> >>>        ^            ^                    ^
> >>>        |            |                    |
> >>>  CPU0 ---------------------------------------------------> (T)
> >>>        \
> >>>         \  Forked process is placed on another CPU
> >>>          V
> >>>  CPU1 ---------------------------------------------------> (T)
> >>>               |                                |
> >>>               V                                V
> >>>               T2:                              T5:
> >>>         Invoke execvp() for profiled       Record MMAP2 event
> >>>       	program
> >>
> >> I don't understand.  An MMAP at T5 cannot affect data recorded before T5
> >> (e.g. T4).
> >> Are you saying the timestamps from different CPUs are not in time order?  Ot
> >> are you saying the data is actually recorded after T5?
> > 
> > Here refers to the later one.
> > 
> > I observed the PMU device is opened and closed for multiple times, so
> > the AUX events and MMAP2 events are interleave with each other.  Below
> > is the dumping with debugging option "--debug ordered-events=1" (and I
> > added code for printing out event type value).
> > 
> > When perf tool starts to decode the trace data for the first AUX event
> > (its timestamp is 160.767623) and try to synthesize samples, it doesn't
> > give chance for handling the sequential MMAP2 events (ts: 160.770223,
> > 160.770451, 160.773111).
> > 
> > [          160.765623] queue_event nr_events 1
> > [          160.765623] new event on queue: type=12
> > [          160.765714] queue_event nr_events 2
> > [          160.765714] new event on queue: type=3
> > [          160.765822] queue_event nr_events 3
> > [          160.765822] new event on queue: type=10  ==> MMAP2 event
> > [          160.766180] queue_event nr_events 4
> > [          160.766180] new event on queue: type=14
> > [          160.767623] queue_event nr_events 5
> > [          160.767623] new event on queue: type=11  ==> AUX event
> > [          160.769710] queue_event nr_events 6
> > [          160.769710] new event on queue: type=12
> > [          160.769762] queue_event nr_events 7
> > [          160.769762] new event on queue: type=14
> > [          160.769810] queue_event nr_events 8
> > [          160.769810] new event on queue: type=14
> > [          160.769943] queue_event nr_events 9
> > [          160.769943] new event on queue: type=11  ==> AUX event
> > [          160.770028] queue_event nr_events 10
> > [          160.770028] new event on queue: type=12
> > [          160.770073] queue_event nr_events 11
> > [          160.770073] new event on queue: type=14
> > [          160.770223] queue_event nr_events 12
> > [          160.770223] new event on queue: type=10  ==> MMAP2 event
> > [          160.770451] queue_event nr_events 13
> > [          160.770451] new event on queue: type=10  ==> MMAP2 event
> > [          160.770628] queue_event nr_events 14
> > [          160.770628] new event on queue: type=14
> > [          160.771650] queue_event nr_events 15
> > [          160.771650] new event on queue: type=11  ==> AUX event
> > [          160.771752] queue_event nr_events 16
> > [          160.771752] new event on queue: type=12
> > [          160.771798] queue_event nr_events 17
> > [          160.771798] new event on queue: type=14
> > [          160.773111] queue_event nr_events 18
> > [          160.773111] new event on queue: type=10  ==> MMAP2 event
> > [          160.782613] queue_event nr_events 19
> > [          160.782613] new event on queue: type=14
> > 
> > Thanks for the reviewing,
> > 
> > Leo
> > 
> 
> When you say AUX event, do you mean PERF_RECORD_AUX or PERF_RECORD_AUXTRACE?

The AUX event is PERF_RECORD_AUX.

> The kernel creates PERF_RECORD_AUX and the data range aux_offset/aux_size
> should be for things prior to the sample time.
> 
> perf tool creates PERF_RECORD_AUXTRACE when copying data out of the aux
> area. PERF_RECORD_AUXTRACE does not have a timestamp and can contain data
> corresponding to several PERF_RECORD_AUX.  Data is queued based on
> PERF_RECORD_AUXTRACE because that is what the data is attached to i.e. one
> buffer can contain data from several PERF_RECORD_AUX events.
> 
> If you are processing data based on PERF_RECORD_AUX timestamp then you need
> to pay attention to the offset.  PERF_RECORD_AUX gives you
> aux_offset/aux_size and auxtrace_buffer (which may contain data from several
> PERF_RECORD_AUX) gives you offset/size.

Yeah, as you suggested, in this case the key point is for the first
PERF_RECORD_AUX event handling, in theory it should contain very short
trace data and exit the decoding quickly.  The issue is the first
PERF_RECORD_AUX event will decode all AUX trace data; and near the end
of events handling, the tool finally has chance to process MMAP2 events.

So this RFC patch is not the right way, I will dig more for
PERF_RECORD_AUX event handling for cs-etm.

Thanks for the detailed explanation and suggetions.

Leo

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-11-19  6:26 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-11-18 10:55 [RFC PATCH] perf session: Fixup timestamp for ordered events Leo Yan
2020-11-18 11:15 ` Adrian Hunter
2020-11-18 11:43   ` Leo Yan
2020-11-18 15:57     ` Adrian Hunter
2020-11-19  6:25       ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).