All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
@ 2021-06-09 13:04 ` James Clark
  0 siblings, 0 replies; 14+ messages in thread
From: James Clark @ 2021-06-09 13:04 UTC (permalink / raw)
  To: acme, mathieu.poirier, coresight, leo.yan
  Cc: al.grant, branislav.rankov, denik, suzuki.poulose,
	anshuman.khandual, James Clark, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.

This can cause the "data has no samples!" error if the first
PERF_RECORD_AUX record comes before the first (or any relevant)
PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
to access the binary data.

This change pushes the start of non-timeless decoding to the very end of
parsing the file. The PERF_RECORD_EXIT event can't be used because it
might not exist in system-wide or snapshot modes.

I have not been able to find the exact cause for the events to be
intermittently in the wrong order in the basic scenario:

	perf record -e cs_etm/@tmc_etr0/u top

But it can be made to happen every time with the --delay option. This is
because "enable_on_exec" is disabled, which causes tracing to start
before the process to be launched is exec'd. For example:

	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
	perf report -D | grep 'AUX\|MAP'

	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so

Another scenario in which decoding from the first aux record fails is a
workload that forks. Although the aux record comes after 'bash', it
comes before 'top', which is what we are interested in. For example:

	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
	perf report -D | grep 'AUX\|MAP'

	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
	...
	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]

A third scenario is when the majority of time is spent in a shared
library that is not loaded at startup. For example a dynamically loaded
plugin.

Testing
=======

Testing was done by checking if any samples that are present in the
old output are missing from the new output. Timestamps must be
stripped out with awk because now they are set to the last AUX sample,
rather than the first:

	./perf script $4 | awk '!($4="")' > new.script
	./perf-default script $4 | awk '!($4="")' > default.script
	comm -13 <(sort -u new.script) <(sort -u default.script)

Testing showed that the new output is a superset of the old. When lines
appear in the comm output, it is not because they are missing but
because [unknown] is now resolved to sensible locations. For example
last putp branch here now resolves to libtinfo, so it's not missing
from the output, but is actually improved:

Old:
	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
New:
	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)

In the following two modes, decoding now works and the "data has no
samples!" error is not displayed any more:

	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
	perf record -e cs_etm/@tmc_etr0/u --delay=1 top

In snapshot mode, there is also an improvement to decoding. Previously
samples for the 'kill' process that was used to send SIGUSR2 were
completely missing, because the process hadn't started yet. But now
there are additional samples present:

	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
	perf script

		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)

Also tested was the round trip of 'perf inject' followed by 'perf
report' which has the same differences and improvements.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 57aea2c7fc77..ceed0b038796 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
 			return err;
 	}
 
+	/*
+	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
+	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
+	 * ETM_OPT_CTXTID is not enabled.
+	 */
 	if (etm->timeless_decoding &&
 	    event->header.type == PERF_RECORD_EXIT)
 		return cs_etm__process_timeless_queues(etm,
@@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
 		 * onwards.
 		 */
 		etm->latest_kernel_timestamp = sample_kernel_timestamp;
-		return cs_etm__process_queues(etm);
 	}
 
 	return 0;
-- 
2.28.0


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
@ 2021-06-09 13:04 ` James Clark
  0 siblings, 0 replies; 14+ messages in thread
From: James Clark @ 2021-06-09 13:04 UTC (permalink / raw)
  To: acme, mathieu.poirier, coresight, leo.yan
  Cc: al.grant, branislav.rankov, denik, suzuki.poulose,
	anshuman.khandual, James Clark, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.

This can cause the "data has no samples!" error if the first
PERF_RECORD_AUX record comes before the first (or any relevant)
PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
to access the binary data.

This change pushes the start of non-timeless decoding to the very end of
parsing the file. The PERF_RECORD_EXIT event can't be used because it
might not exist in system-wide or snapshot modes.

I have not been able to find the exact cause for the events to be
intermittently in the wrong order in the basic scenario:

	perf record -e cs_etm/@tmc_etr0/u top

But it can be made to happen every time with the --delay option. This is
because "enable_on_exec" is disabled, which causes tracing to start
before the process to be launched is exec'd. For example:

	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
	perf report -D | grep 'AUX\|MAP'

	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so

Another scenario in which decoding from the first aux record fails is a
workload that forks. Although the aux record comes after 'bash', it
comes before 'top', which is what we are interested in. For example:

	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
	perf report -D | grep 'AUX\|MAP'

	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
	...
	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]

A third scenario is when the majority of time is spent in a shared
library that is not loaded at startup. For example a dynamically loaded
plugin.

Testing
=======

Testing was done by checking if any samples that are present in the
old output are missing from the new output. Timestamps must be
stripped out with awk because now they are set to the last AUX sample,
rather than the first:

	./perf script $4 | awk '!($4="")' > new.script
	./perf-default script $4 | awk '!($4="")' > default.script
	comm -13 <(sort -u new.script) <(sort -u default.script)

Testing showed that the new output is a superset of the old. When lines
appear in the comm output, it is not because they are missing but
because [unknown] is now resolved to sensible locations. For example
last putp branch here now resolves to libtinfo, so it's not missing
from the output, but is actually improved:

Old:
	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
New:
	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)

In the following two modes, decoding now works and the "data has no
samples!" error is not displayed any more:

	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
	perf record -e cs_etm/@tmc_etr0/u --delay=1 top

In snapshot mode, there is also an improvement to decoding. Previously
samples for the 'kill' process that was used to send SIGUSR2 were
completely missing, because the process hadn't started yet. But now
there are additional samples present:

	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
	perf script

		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)

Also tested was the round trip of 'perf inject' followed by 'perf
report' which has the same differences and improvements.

Signed-off-by: James Clark <james.clark@arm.com>
---
 tools/perf/util/cs-etm.c | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
index 57aea2c7fc77..ceed0b038796 100644
--- a/tools/perf/util/cs-etm.c
+++ b/tools/perf/util/cs-etm.c
@@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
 			return err;
 	}
 
+	/*
+	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
+	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
+	 * ETM_OPT_CTXTID is not enabled.
+	 */
 	if (etm->timeless_decoding &&
 	    event->header.type == PERF_RECORD_EXIT)
 		return cs_etm__process_timeless_queues(etm,
@@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
 		 * onwards.
 		 */
 		etm->latest_kernel_timestamp = sample_kernel_timestamp;
-		return cs_etm__process_queues(etm);
 	}
 
 	return 0;
-- 
2.28.0


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
  2021-06-09 13:04 ` James Clark
@ 2021-06-10 16:24   ` Mathieu Poirier
  -1 siblings, 0 replies; 14+ messages in thread
From: Mathieu Poirier @ 2021-06-10 16:24 UTC (permalink / raw)
  To: James Clark
  Cc: acme, coresight, leo.yan, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

This patch doesn't apply on coresight next.  

On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> 
> This can cause the "data has no samples!" error if the first
> PERF_RECORD_AUX record comes before the first (or any relevant)
> PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> to access the binary data.
> 
> This change pushes the start of non-timeless decoding to the very end of
> parsing the file. The PERF_RECORD_EXIT event can't be used because it
> might not exist in system-wide or snapshot modes.
> 
> I have not been able to find the exact cause for the events to be
> intermittently in the wrong order in the basic scenario:
> 
> 	perf record -e cs_etm/@tmc_etr0/u top
> 
> But it can be made to happen every time with the --delay option. This is
> because "enable_on_exec" is disabled, which causes tracing to start
> before the process to be launched is exec'd. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 
> Another scenario in which decoding from the first aux record fails is a
> workload that forks. Although the aux record comes after 'bash', it
> comes before 'top', which is what we are interested in. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> 	...
> 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 
> A third scenario is when the majority of time is spent in a shared
> library that is not loaded at startup. For example a dynamically loaded
> plugin.
> 
> Testing
> =======
> 
> Testing was done by checking if any samples that are present in the
> old output are missing from the new output. Timestamps must be
> stripped out with awk because now they are set to the last AUX sample,
> rather than the first:
> 
> 	./perf script $4 | awk '!($4="")' > new.script
> 	./perf-default script $4 | awk '!($4="")' > default.script
> 	comm -13 <(sort -u new.script) <(sort -u default.script)
> 
> Testing showed that the new output is a superset of the old. When lines
> appear in the comm output, it is not because they are missing but
> because [unknown] is now resolved to sensible locations. For example
> last putp branch here now resolves to libtinfo, so it's not missing
> from the output, but is actually improved:
> 
> Old:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> New:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> 
> In the following two modes, decoding now works and the "data has no
> samples!" error is not displayed any more:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 
> In snapshot mode, there is also an improvement to decoding. Previously
> samples for the 'kill' process that was used to send SIGUSR2 were
> completely missing, because the process hadn't started yet. But now
> there are additional samples present:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> 	perf script
> 
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> 
> Also tested was the round trip of 'perf inject' followed by 'perf
> report' which has the same differences and improvements.
> 
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 57aea2c7fc77..ceed0b038796 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
>  			return err;
>  	}
>  
> +	/*
> +	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
> +	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
> +	 * ETM_OPT_CTXTID is not enabled.
> +	 */
>  	if (etm->timeless_decoding &&
>  	    event->header.type == PERF_RECORD_EXIT)
>  		return cs_etm__process_timeless_queues(etm,
> @@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
>  		 * onwards.
>  		 */
>  		etm->latest_kernel_timestamp = sample_kernel_timestamp;
> -		return cs_etm__process_queues(etm);
>  	}
>  
>  	return 0;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
@ 2021-06-10 16:24   ` Mathieu Poirier
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Poirier @ 2021-06-10 16:24 UTC (permalink / raw)
  To: James Clark
  Cc: acme, coresight, leo.yan, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

This patch doesn't apply on coresight next.  

On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> 
> This can cause the "data has no samples!" error if the first
> PERF_RECORD_AUX record comes before the first (or any relevant)
> PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> to access the binary data.
> 
> This change pushes the start of non-timeless decoding to the very end of
> parsing the file. The PERF_RECORD_EXIT event can't be used because it
> might not exist in system-wide or snapshot modes.
> 
> I have not been able to find the exact cause for the events to be
> intermittently in the wrong order in the basic scenario:
> 
> 	perf record -e cs_etm/@tmc_etr0/u top
> 
> But it can be made to happen every time with the --delay option. This is
> because "enable_on_exec" is disabled, which causes tracing to start
> before the process to be launched is exec'd. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 
> Another scenario in which decoding from the first aux record fails is a
> workload that forks. Although the aux record comes after 'bash', it
> comes before 'top', which is what we are interested in. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> 	...
> 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 
> A third scenario is when the majority of time is spent in a shared
> library that is not loaded at startup. For example a dynamically loaded
> plugin.
> 
> Testing
> =======
> 
> Testing was done by checking if any samples that are present in the
> old output are missing from the new output. Timestamps must be
> stripped out with awk because now they are set to the last AUX sample,
> rather than the first:
> 
> 	./perf script $4 | awk '!($4="")' > new.script
> 	./perf-default script $4 | awk '!($4="")' > default.script
> 	comm -13 <(sort -u new.script) <(sort -u default.script)
> 
> Testing showed that the new output is a superset of the old. When lines
> appear in the comm output, it is not because they are missing but
> because [unknown] is now resolved to sensible locations. For example
> last putp branch here now resolves to libtinfo, so it's not missing
> from the output, but is actually improved:
> 
> Old:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> New:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> 
> In the following two modes, decoding now works and the "data has no
> samples!" error is not displayed any more:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 
> In snapshot mode, there is also an improvement to decoding. Previously
> samples for the 'kill' process that was used to send SIGUSR2 were
> completely missing, because the process hadn't started yet. But now
> there are additional samples present:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> 	perf script
> 
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> 
> Also tested was the round trip of 'perf inject' followed by 'perf
> report' which has the same differences and improvements.
> 
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 57aea2c7fc77..ceed0b038796 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
>  			return err;
>  	}
>  
> +	/*
> +	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
> +	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
> +	 * ETM_OPT_CTXTID is not enabled.
> +	 */
>  	if (etm->timeless_decoding &&
>  	    event->header.type == PERF_RECORD_EXIT)
>  		return cs_etm__process_timeless_queues(etm,
> @@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
>  		 * onwards.
>  		 */
>  		etm->latest_kernel_timestamp = sample_kernel_timestamp;
> -		return cs_etm__process_queues(etm);
>  	}
>  
>  	return 0;
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
  2021-06-10 16:24   ` Mathieu Poirier
@ 2021-06-10 16:39     ` Mathieu Poirier
  -1 siblings, 0 replies; 14+ messages in thread
From: Mathieu Poirier @ 2021-06-10 16:39 UTC (permalink / raw)
  To: James Clark
  Cc: acme, coresight, leo.yan, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Thu, Jun 10, 2021 at 10:24:53AM -0600, Mathieu Poirier wrote:
> This patch doesn't apply on coresight next.  

I was able to apply this patch to Arnaldo's tree.

> 
> On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> > Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> > non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> > 
> > This can cause the "data has no samples!" error if the first
> > PERF_RECORD_AUX record comes before the first (or any relevant)
> > PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> > to access the binary data.
> > 
> > This change pushes the start of non-timeless decoding to the very end of
> > parsing the file. The PERF_RECORD_EXIT event can't be used because it
> > might not exist in system-wide or snapshot modes.
> > 
> > I have not been able to find the exact cause for the events to be
> > intermittently in the wrong order in the basic scenario:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u top
> > 
> > But it can be made to happen every time with the --delay option. This is
> > because "enable_on_exec" is disabled, which causes tracing to start
> > before the process to be launched is exec'd. For example:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> > 	perf report -D | grep 'AUX\|MAP'
> > 
> > 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> > 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> > 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> > 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> > 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 
> > Another scenario in which decoding from the first aux record fails is a
> > workload that forks. Although the aux record comes after 'bash', it
> > comes before 'top', which is what we are interested in. For example:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> > 	perf report -D | grep 'AUX\|MAP'
> > 
> > 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> > 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> > 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> > 	...
> > 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> > 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> > 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> > 
> > A third scenario is when the majority of time is spent in a shared
> > library that is not loaded at startup. For example a dynamically loaded
> > plugin.
> > 
> > Testing
> > =======
> > 
> > Testing was done by checking if any samples that are present in the
> > old output are missing from the new output. Timestamps must be
> > stripped out with awk because now they are set to the last AUX sample,
> > rather than the first:
> > 
> > 	./perf script $4 | awk '!($4="")' > new.script
> > 	./perf-default script $4 | awk '!($4="")' > default.script
> > 	comm -13 <(sort -u new.script) <(sort -u default.script)
> > 
> > Testing showed that the new output is a superset of the old. When lines
> > appear in the comm output, it is not because they are missing but
> > because [unknown] is now resolved to sensible locations. For example
> > last putp branch here now resolves to libtinfo, so it's not missing
> > from the output, but is actually improved:
> > 
> > Old:
> > 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> > New:
> > 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> > 
> > In the following two modes, decoding now works and the "data has no
> > samples!" error is not displayed any more:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> > 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> > 
> > In snapshot mode, there is also an improvement to decoding. Previously
> > samples for the 'kill' process that was used to send SIGUSR2 were
> > completely missing, because the process hadn't started yet. But now
> > there are additional samples present:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> > 	perf script
> > 
> > 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> > 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> > 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> > 
> > Also tested was the round trip of 'perf inject' followed by 'perf
> > report' which has the same differences and improvements.
> > 
> > Signed-off-by: James Clark <james.clark@arm.com>
> > ---
> >  tools/perf/util/cs-etm.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 57aea2c7fc77..ceed0b038796 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
> >  			return err;
> >  	}
> >  
> > +	/*
> > +	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
> > +	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
> > +	 * ETM_OPT_CTXTID is not enabled.
> > +	 */
> >  	if (etm->timeless_decoding &&
> >  	    event->header.type == PERF_RECORD_EXIT)
> >  		return cs_etm__process_timeless_queues(etm,
> > @@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
> >  		 * onwards.
> >  		 */
> >  		etm->latest_kernel_timestamp = sample_kernel_timestamp;
> > -		return cs_etm__process_queues(etm);
> >  	}
> >  
> >  	return 0;
> > -- 
> > 2.28.0
> > 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
@ 2021-06-10 16:39     ` Mathieu Poirier
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Poirier @ 2021-06-10 16:39 UTC (permalink / raw)
  To: James Clark
  Cc: acme, coresight, leo.yan, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Thu, Jun 10, 2021 at 10:24:53AM -0600, Mathieu Poirier wrote:
> This patch doesn't apply on coresight next.  

I was able to apply this patch to Arnaldo's tree.

> 
> On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> > Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> > non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> > 
> > This can cause the "data has no samples!" error if the first
> > PERF_RECORD_AUX record comes before the first (or any relevant)
> > PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> > to access the binary data.
> > 
> > This change pushes the start of non-timeless decoding to the very end of
> > parsing the file. The PERF_RECORD_EXIT event can't be used because it
> > might not exist in system-wide or snapshot modes.
> > 
> > I have not been able to find the exact cause for the events to be
> > intermittently in the wrong order in the basic scenario:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u top
> > 
> > But it can be made to happen every time with the --delay option. This is
> > because "enable_on_exec" is disabled, which causes tracing to start
> > before the process to be launched is exec'd. For example:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> > 	perf report -D | grep 'AUX\|MAP'
> > 
> > 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> > 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> > 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> > 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> > 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 
> > Another scenario in which decoding from the first aux record fails is a
> > workload that forks. Although the aux record comes after 'bash', it
> > comes before 'top', which is what we are interested in. For example:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> > 	perf report -D | grep 'AUX\|MAP'
> > 
> > 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> > 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> > 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> > 	...
> > 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> > 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> > 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> > 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> > 
> > A third scenario is when the majority of time is spent in a shared
> > library that is not loaded at startup. For example a dynamically loaded
> > plugin.
> > 
> > Testing
> > =======
> > 
> > Testing was done by checking if any samples that are present in the
> > old output are missing from the new output. Timestamps must be
> > stripped out with awk because now they are set to the last AUX sample,
> > rather than the first:
> > 
> > 	./perf script $4 | awk '!($4="")' > new.script
> > 	./perf-default script $4 | awk '!($4="")' > default.script
> > 	comm -13 <(sort -u new.script) <(sort -u default.script)
> > 
> > Testing showed that the new output is a superset of the old. When lines
> > appear in the comm output, it is not because they are missing but
> > because [unknown] is now resolved to sensible locations. For example
> > last putp branch here now resolves to libtinfo, so it's not missing
> > from the output, but is actually improved:
> > 
> > Old:
> > 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> > New:
> > 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> > 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> > 
> > In the following two modes, decoding now works and the "data has no
> > samples!" error is not displayed any more:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> > 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> > 
> > In snapshot mode, there is also an improvement to decoding. Previously
> > samples for the 'kill' process that was used to send SIGUSR2 were
> > completely missing, because the process hadn't started yet. But now
> > there are additional samples present:
> > 
> > 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> > 	perf script
> > 
> > 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> > 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> > 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> > 
> > Also tested was the round trip of 'perf inject' followed by 'perf
> > report' which has the same differences and improvements.
> > 
> > Signed-off-by: James Clark <james.clark@arm.com>
> > ---
> >  tools/perf/util/cs-etm.c | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> > 
> > diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> > index 57aea2c7fc77..ceed0b038796 100644
> > --- a/tools/perf/util/cs-etm.c
> > +++ b/tools/perf/util/cs-etm.c
> > @@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
> >  			return err;
> >  	}
> >  
> > +	/*
> > +	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
> > +	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
> > +	 * ETM_OPT_CTXTID is not enabled.
> > +	 */
> >  	if (etm->timeless_decoding &&
> >  	    event->header.type == PERF_RECORD_EXIT)
> >  		return cs_etm__process_timeless_queues(etm,
> > @@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
> >  		 * onwards.
> >  		 */
> >  		etm->latest_kernel_timestamp = sample_kernel_timestamp;
> > -		return cs_etm__process_queues(etm);
> >  	}
> >  
> >  	return 0;
> > -- 
> > 2.28.0
> > 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
  2021-06-09 13:04 ` James Clark
@ 2021-06-14 15:54   ` Mathieu Poirier
  -1 siblings, 0 replies; 14+ messages in thread
From: Mathieu Poirier @ 2021-06-14 15:54 UTC (permalink / raw)
  To: James Clark
  Cc: acme, coresight, leo.yan, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> 
> This can cause the "data has no samples!" error if the first
> PERF_RECORD_AUX record comes before the first (or any relevant)
> PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> to access the binary data.
> 
> This change pushes the start of non-timeless decoding to the very end of
> parsing the file. The PERF_RECORD_EXIT event can't be used because it
> might not exist in system-wide or snapshot modes.
> 
> I have not been able to find the exact cause for the events to be
> intermittently in the wrong order in the basic scenario:
> 
> 	perf record -e cs_etm/@tmc_etr0/u top
> 
> But it can be made to happen every time with the --delay option. This is
> because "enable_on_exec" is disabled, which causes tracing to start
> before the process to be launched is exec'd. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 
> Another scenario in which decoding from the first aux record fails is a
> workload that forks. Although the aux record comes after 'bash', it
> comes before 'top', which is what we are interested in. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> 	...
> 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 
> A third scenario is when the majority of time is spent in a shared
> library that is not loaded at startup. For example a dynamically loaded
> plugin.
> 
> Testing
> =======
> 
> Testing was done by checking if any samples that are present in the
> old output are missing from the new output. Timestamps must be
> stripped out with awk because now they are set to the last AUX sample,
> rather than the first:
> 
> 	./perf script $4 | awk '!($4="")' > new.script
> 	./perf-default script $4 | awk '!($4="")' > default.script
> 	comm -13 <(sort -u new.script) <(sort -u default.script)
> 
> Testing showed that the new output is a superset of the old. When lines
> appear in the comm output, it is not because they are missing but
> because [unknown] is now resolved to sensible locations. For example
> last putp branch here now resolves to libtinfo, so it's not missing
> from the output, but is actually improved:
> 
> Old:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> New:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> 
> In the following two modes, decoding now works and the "data has no
> samples!" error is not displayed any more:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 
> In snapshot mode, there is also an improvement to decoding. Previously
> samples for the 'kill' process that was used to send SIGUSR2 were
> completely missing, because the process hadn't started yet. But now
> there are additional samples present:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> 	perf script
> 
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> 
> Also tested was the round trip of 'perf inject' followed by 'perf
> report' which has the same differences and improvements.
> 
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 57aea2c7fc77..ceed0b038796 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
>  			return err;
>  	}
>  
> +	/*
> +	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
> +	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
> +	 * ETM_OPT_CTXTID is not enabled.
> +	 */
>  	if (etm->timeless_decoding &&
>  	    event->header.type == PERF_RECORD_EXIT)
>  		return cs_etm__process_timeless_queues(etm,
> @@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
>  		 * onwards.
>  		 */
>  		etm->latest_kernel_timestamp = sample_kernel_timestamp;
> -		return cs_etm__process_queues(etm);

This looks good to me but I'd like to have other people testing it.

Thanks,
Mathieu

>  	}
>  
>  	return 0;
> -- 
> 2.28.0
> 

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
@ 2021-06-14 15:54   ` Mathieu Poirier
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Poirier @ 2021-06-14 15:54 UTC (permalink / raw)
  To: James Clark
  Cc: acme, coresight, leo.yan, al.grant, branislav.rankov, denik,
	suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> 
> This can cause the "data has no samples!" error if the first
> PERF_RECORD_AUX record comes before the first (or any relevant)
> PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> to access the binary data.
> 
> This change pushes the start of non-timeless decoding to the very end of
> parsing the file. The PERF_RECORD_EXIT event can't be used because it
> might not exist in system-wide or snapshot modes.
> 
> I have not been able to find the exact cause for the events to be
> intermittently in the wrong order in the basic scenario:
> 
> 	perf record -e cs_etm/@tmc_etr0/u top
> 
> But it can be made to happen every time with the --delay option. This is
> because "enable_on_exec" is disabled, which causes tracing to start
> before the process to be launched is exec'd. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 
> Another scenario in which decoding from the first aux record fails is a
> workload that forks. Although the aux record comes after 'bash', it
> comes before 'top', which is what we are interested in. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> 	...
> 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 
> A third scenario is when the majority of time is spent in a shared
> library that is not loaded at startup. For example a dynamically loaded
> plugin.
> 
> Testing
> =======
> 
> Testing was done by checking if any samples that are present in the
> old output are missing from the new output. Timestamps must be
> stripped out with awk because now they are set to the last AUX sample,
> rather than the first:
> 
> 	./perf script $4 | awk '!($4="")' > new.script
> 	./perf-default script $4 | awk '!($4="")' > default.script
> 	comm -13 <(sort -u new.script) <(sort -u default.script)
> 
> Testing showed that the new output is a superset of the old. When lines
> appear in the comm output, it is not because they are missing but
> because [unknown] is now resolved to sensible locations. For example
> last putp branch here now resolves to libtinfo, so it's not missing
> from the output, but is actually improved:
> 
> Old:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> New:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> 
> In the following two modes, decoding now works and the "data has no
> samples!" error is not displayed any more:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 
> In snapshot mode, there is also an improvement to decoding. Previously
> samples for the 'kill' process that was used to send SIGUSR2 were
> completely missing, because the process hadn't started yet. But now
> there are additional samples present:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> 	perf script
> 
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> 
> Also tested was the round trip of 'perf inject' followed by 'perf
> report' which has the same differences and improvements.
> 
> Signed-off-by: James Clark <james.clark@arm.com>
> ---
>  tools/perf/util/cs-etm.c | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/perf/util/cs-etm.c b/tools/perf/util/cs-etm.c
> index 57aea2c7fc77..ceed0b038796 100644
> --- a/tools/perf/util/cs-etm.c
> +++ b/tools/perf/util/cs-etm.c
> @@ -2407,6 +2407,11 @@ static int cs_etm__process_event(struct perf_session *session,
>  			return err;
>  	}
>  
> +	/*
> +	 * Don't wait for cs_etm__flush_events() in per-thread/timeless mode to start the decode. We
> +	 * need the tid of the PERF_RECORD_EXIT event to assign to the synthesised samples because
> +	 * ETM_OPT_CTXTID is not enabled.
> +	 */
>  	if (etm->timeless_decoding &&
>  	    event->header.type == PERF_RECORD_EXIT)
>  		return cs_etm__process_timeless_queues(etm,
> @@ -2424,7 +2429,6 @@ static int cs_etm__process_event(struct perf_session *session,
>  		 * onwards.
>  		 */
>  		etm->latest_kernel_timestamp = sample_kernel_timestamp;
> -		return cs_etm__process_queues(etm);

This looks good to me but I'd like to have other people testing it.

Thanks,
Mathieu

>  	}
>  
>  	return 0;
> -- 
> 2.28.0
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
  2021-06-09 13:04 ` James Clark
@ 2021-07-01 13:34   ` Leo Yan
  -1 siblings, 0 replies; 14+ messages in thread
From: Leo Yan @ 2021-07-01 13:34 UTC (permalink / raw)
  To: James Clark
  Cc: acme, mathieu.poirier, coresight, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> 
> This can cause the "data has no samples!" error if the first
> PERF_RECORD_AUX record comes before the first (or any relevant)
> PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> to access the binary data.
> 
> This change pushes the start of non-timeless decoding to the very end of
> parsing the file. The PERF_RECORD_EXIT event can't be used because it
> might not exist in system-wide or snapshot modes.
> 
> I have not been able to find the exact cause for the events to be
> intermittently in the wrong order in the basic scenario:
> 
> 	perf record -e cs_etm/@tmc_etr0/u top
> 
> But it can be made to happen every time with the --delay option. This is
> because "enable_on_exec" is disabled, which causes tracing to start
> before the process to be launched is exec'd. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 
> Another scenario in which decoding from the first aux record fails is a
> workload that forks. Although the aux record comes after 'bash', it
> comes before 'top', which is what we are interested in. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> 	...
> 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 
> A third scenario is when the majority of time is spent in a shared
> library that is not loaded at startup. For example a dynamically loaded
> plugin.
> 
> Testing
> =======
> 
> Testing was done by checking if any samples that are present in the
> old output are missing from the new output. Timestamps must be
> stripped out with awk because now they are set to the last AUX sample,
> rather than the first:
> 
> 	./perf script $4 | awk '!($4="")' > new.script
> 	./perf-default script $4 | awk '!($4="")' > default.script
> 	comm -13 <(sort -u new.script) <(sort -u default.script)
> 
> Testing showed that the new output is a superset of the old. When lines
> appear in the comm output, it is not because they are missing but
> because [unknown] is now resolved to sensible locations. For example
> last putp branch here now resolves to libtinfo, so it's not missing
> from the output, but is actually improved:
> 
> Old:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> New:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> 
> In the following two modes, decoding now works and the "data has no
> samples!" error is not displayed any more:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 
> In snapshot mode, there is also an improvement to decoding. Previously
> samples for the 'kill' process that was used to send SIGUSR2 were
> completely missing, because the process hadn't started yet. But now
> there are additional samples present:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> 	perf script
> 
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> 
> Also tested was the round trip of 'perf inject' followed by 'perf
> report' which has the same differences and improvements.
> 
> Signed-off-by: James Clark <james.clark@arm.com>

Excellent work, James!  I learned several things from it :)

I went through the testing cases one by one, have verified all
elaborated cases (but I don't verify injection cases):

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
@ 2021-07-01 13:34   ` Leo Yan
  0 siblings, 0 replies; 14+ messages in thread
From: Leo Yan @ 2021-07-01 13:34 UTC (permalink / raw)
  To: James Clark
  Cc: acme, mathieu.poirier, coresight, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> 
> This can cause the "data has no samples!" error if the first
> PERF_RECORD_AUX record comes before the first (or any relevant)
> PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> to access the binary data.
> 
> This change pushes the start of non-timeless decoding to the very end of
> parsing the file. The PERF_RECORD_EXIT event can't be used because it
> might not exist in system-wide or snapshot modes.
> 
> I have not been able to find the exact cause for the events to be
> intermittently in the wrong order in the basic scenario:
> 
> 	perf record -e cs_etm/@tmc_etr0/u top
> 
> But it can be made to happen every time with the --delay option. This is
> because "enable_on_exec" is disabled, which causes tracing to start
> before the process to be launched is exec'd. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	0 16714475632740 0x520 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x30 flags: 0 []
> 	0 16714476494960 0x5d0 [0x40]: PERF_RECORD_AUX offset: 0x30 size: 0x30 flags: 0 []
> 	0 16714478208900 0x660 [0x40]: PERF_RECORD_AUX offset: 0x60 size: 0x30 flags: 0 []
> 	4294967295 16714478293340 0x700 [0x70]: PERF_RECORD_MMAP2 8712/8712: [0x557a460000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16714478353020 0x770 [0x88]: PERF_RECORD_MMAP2 8712/8712: [0x7f86f72000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 
> Another scenario in which decoding from the first aux record fails is a
> workload that forks. Although the aux record comes after 'bash', it
> comes before 'top', which is what we are interested in. For example:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf report -D | grep 'AUX\|MAP'
> 
> 	4294967295 16853946421300 0x510 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x558f280000(0x142000) @ 0 00:17 5213953 0]: r-xp /usr/bin/bash
> 	4294967295 16853946543560 0x580 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba6e000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853946628420 0x608 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7fbba9e000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 	0 16853947067300 0x690 [0x40]: PERF_RECORD_AUX offset: 0 size: 0x3a60 flags: 0 []
> 	...
> 	0 16853966602580 0x1758 [0x40]: PERF_RECORD_AUX offset: 0xc2470 size: 0x30 flags: 0 []
> 	4294967295 16853967119860 0x1818 [0x70]: PERF_RECORD_MMAP2 8723/8723: [0x5559e70000(0x54000) @ 0 00:17 5329258 0]: r-xp /usr/bin/top
> 	4294967295 16853967181620 0x1888 [0x88]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed06000(0x34000) @ 0 00:17 5214354 0]: r-xp /usr/lib/aarch64-linux-gnu/ld-2.31.so
> 	4294967295 16853967237180 0x1910 [0x68]: PERF_RECORD_MMAP2 8723/8723: [0x7f9ed36000(0x1000) @ 0 00:00 0 0]: r-xp [vdso]
> 
> A third scenario is when the majority of time is spent in a shared
> library that is not loaded at startup. For example a dynamically loaded
> plugin.
> 
> Testing
> =======
> 
> Testing was done by checking if any samples that are present in the
> old output are missing from the new output. Timestamps must be
> stripped out with awk because now they are set to the last AUX sample,
> rather than the first:
> 
> 	./perf script $4 | awk '!($4="")' > new.script
> 	./perf-default script $4 | awk '!($4="")' > default.script
> 	comm -13 <(sort -u new.script) <(sort -u default.script)
> 
> Testing showed that the new output is a superset of the old. When lines
> appear in the comm output, it is not because they are missing but
> because [unknown] is now resolved to sensible locations. For example
> last putp branch here now resolves to libtinfo, so it's not missing
> from the output, but is actually improved:
> 
> Old:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 0 [unknown] ([unknown])
> New:
> 	top 305 [001]  1 branches:uH: 402830 _init+0x30 (/usr/bin/top.procps) => 404a1c [unknown] (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 404a20 [unknown] (/usr/bin/top.procps) => 402970 putp@plt+0x0 (/usr/bin/top.procps)
> 	top 305 [001]  1 branches:uH: 40297c putp@plt+0xc (/usr/bin/top.procps) => 7f8ab39208 putp+0x0 (/lib/libtinfo.so.5.9)
> 
> In the following two modes, decoding now works and the "data has no
> samples!" error is not displayed any more:
> 
> 	perf record -e cs_etm/@tmc_etr0/u -- bash -c top
> 	perf record -e cs_etm/@tmc_etr0/u --delay=1 top
> 
> In snapshot mode, there is also an improvement to decoding. Previously
> samples for the 'kill' process that was used to send SIGUSR2 were
> completely missing, because the process hadn't started yet. But now
> there are additional samples present:
> 
> 	perf record -e cs_etm/@tmc_etr0/u --snapshot -a
> 	perf script
> 
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      aaaabb612fb4 [unknown] (/usr/bin/stress)
> 		  kill 19644 [000] 161627.938153:    1000000    instructions:uH:      ffffae0ef210 [unknown] (/lib/aarch64-linux-gnu/ld-2.27.so)
> 		stress 19380 [003] 161627.938153:    1000000    instructions:uH:      ffff9e754d40 random_r+0x20 (/lib/aarch64-linux-gnu/libc-2.27.so)
> 
> Also tested was the round trip of 'perf inject' followed by 'perf
> report' which has the same differences and improvements.
> 
> Signed-off-by: James Clark <james.clark@arm.com>

Excellent work, James!  I learned several things from it :)

I went through the testing cases one by one, have verified all
elaborated cases (but I don't verify injection cases):

Reviewed-by: Leo Yan <leo.yan@linaro.org>
Tested-by: Leo Yan <leo.yan@linaro.org>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
  2021-06-14 15:54   ` Mathieu Poirier
@ 2021-07-01 17:15     ` Arnaldo Carvalho de Melo
  -1 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-07-01 17:15 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: James Clark, coresight, leo.yan, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

Em Mon, Jun 14, 2021 at 09:54:42AM -0600, Mathieu Poirier escreveu:
> On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> > Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> > non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> > 
> > This can cause the "data has no samples!" error if the first
> > PERF_RECORD_AUX record comes before the first (or any relevant)
> > PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> > to access the binary data.
> > 
> > This change pushes the start of non-timeless decoding to the very end of
> > parsing the file. The PERF_RECORD_EXIT event can't be used because it
> > might not exist in system-wide or snapshot modes.

<SNIP>

> This looks good to me but I'd like to have other people testing it.

So Leo reviewed and tested this, I'm taking your "looks good to me" as
an Acked-by, as per Documentation/process/submitting-patches.rst.

- Arnaldo
 
> Thanks,
> Mathieu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
@ 2021-07-01 17:15     ` Arnaldo Carvalho de Melo
  0 siblings, 0 replies; 14+ messages in thread
From: Arnaldo Carvalho de Melo @ 2021-07-01 17:15 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: James Clark, coresight, leo.yan, al.grant, branislav.rankov,
	denik, suzuki.poulose, anshuman.khandual, Mike Leach, John Garry,
	Will Deacon, Mark Rutland, Alexander Shishkin, Jiri Olsa,
	Namhyung Kim, linux-arm-kernel, linux-perf-users, linux-kernel

Em Mon, Jun 14, 2021 at 09:54:42AM -0600, Mathieu Poirier escreveu:
> On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> > Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> > non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> > 
> > This can cause the "data has no samples!" error if the first
> > PERF_RECORD_AUX record comes before the first (or any relevant)
> > PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> > to access the binary data.
> > 
> > This change pushes the start of non-timeless decoding to the very end of
> > parsing the file. The PERF_RECORD_EXIT event can't be used because it
> > might not exist in system-wide or snapshot modes.

<SNIP>

> This looks good to me but I'd like to have other people testing it.

So Leo reviewed and tested this, I'm taking your "looks good to me" as
an Acked-by, as per Documentation/process/submitting-patches.rst.

- Arnaldo
 
> Thanks,
> Mathieu

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
  2021-07-01 17:15     ` Arnaldo Carvalho de Melo
@ 2021-07-01 21:33       ` Mathieu Poirier
  -1 siblings, 0 replies; 14+ messages in thread
From: Mathieu Poirier @ 2021-07-01 21:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: James Clark, Coresight ML, Leo Yan, Al Grant, Branislav Rankov,
	Denis Nikitin, Suzuki K. Poulose, Anshuman Khandual, Mike Leach,
	John Garry, Will Deacon, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	Linux Kernel Mailing List

On Thu, 1 Jul 2021 at 11:15, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Mon, Jun 14, 2021 at 09:54:42AM -0600, Mathieu Poirier escreveu:
> > On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> > > Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> > > non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> > >
> > > This can cause the "data has no samples!" error if the first
> > > PERF_RECORD_AUX record comes before the first (or any relevant)
> > > PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> > > to access the binary data.
> > >
> > > This change pushes the start of non-timeless decoding to the very end of
> > > parsing the file. The PERF_RECORD_EXIT event can't be used because it
> > > might not exist in system-wide or snapshot modes.
>
> <SNIP>
>
> > This looks good to me but I'd like to have other people testing it.
>
> So Leo reviewed and tested this, I'm taking your "looks good to me" as
> an Acked-by, as per Documentation/process/submitting-patches.rst.
>

Yes, please.  My goal here was to wait for other people to test this
code, which Leo did.

> - Arnaldo
>
> > Thanks,
> > Mathieu

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events()
@ 2021-07-01 21:33       ` Mathieu Poirier
  0 siblings, 0 replies; 14+ messages in thread
From: Mathieu Poirier @ 2021-07-01 21:33 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo
  Cc: James Clark, Coresight ML, Leo Yan, Al Grant, Branislav Rankov,
	Denis Nikitin, Suzuki K. Poulose, Anshuman Khandual, Mike Leach,
	John Garry, Will Deacon, Mark Rutland, Alexander Shishkin,
	Jiri Olsa, Namhyung Kim, linux-arm-kernel, linux-perf-users,
	Linux Kernel Mailing List

On Thu, 1 Jul 2021 at 11:15, Arnaldo Carvalho de Melo <acme@kernel.org> wrote:
>
> Em Mon, Jun 14, 2021 at 09:54:42AM -0600, Mathieu Poirier escreveu:
> > On Wed, Jun 09, 2021 at 04:04:20PM +0300, James Clark wrote:
> > > Currently, timeless mode starts the decode on PERF_RECORD_EXIT, and
> > > non-timeless mode starts decoding on the fist PERF_RECORD_AUX record.
> > >
> > > This can cause the "data has no samples!" error if the first
> > > PERF_RECORD_AUX record comes before the first (or any relevant)
> > > PERF_RECORD_MMAP2 record because the mmaps are required by the decoder
> > > to access the binary data.
> > >
> > > This change pushes the start of non-timeless decoding to the very end of
> > > parsing the file. The PERF_RECORD_EXIT event can't be used because it
> > > might not exist in system-wide or snapshot modes.
>
> <SNIP>
>
> > This looks good to me but I'd like to have other people testing it.
>
> So Leo reviewed and tested this, I'm taking your "looks good to me" as
> an Acked-by, as per Documentation/process/submitting-patches.rst.
>

Yes, please.  My goal here was to wait for other people to test this
code, which Leo did.

> - Arnaldo
>
> > Thanks,
> > Mathieu

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2021-07-01 21:35 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-09 13:04 [PATCH] perf cs-etm: Delay decode of non-timeless data until cs_etm__flush_events() James Clark
2021-06-09 13:04 ` James Clark
2021-06-10 16:24 ` Mathieu Poirier
2021-06-10 16:24   ` Mathieu Poirier
2021-06-10 16:39   ` Mathieu Poirier
2021-06-10 16:39     ` Mathieu Poirier
2021-06-14 15:54 ` Mathieu Poirier
2021-06-14 15:54   ` Mathieu Poirier
2021-07-01 17:15   ` Arnaldo Carvalho de Melo
2021-07-01 17:15     ` Arnaldo Carvalho de Melo
2021-07-01 21:33     ` Mathieu Poirier
2021-07-01 21:33       ` Mathieu Poirier
2021-07-01 13:34 ` Leo Yan
2021-07-01 13:34   ` Leo Yan

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.