linux-perf-users.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v1 0/3] coresight: Fix for snapshot mode
@ 2021-05-28 16:15 Leo Yan
  2021-05-28 16:15 ` [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot Leo Yan
                   ` (3 more replies)
  0 siblings, 4 replies; 21+ messages in thread
From: Leo Yan @ 2021-05-28 16:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Denis Nikitin, coresight, linux-arm-kernel,
	linux-kernel, linux-perf-users
  Cc: Leo Yan

This patch series is to correct the pointer usages for the snapshot
mode.

Patch 01 allows the AUX trace in the free run mode and only syncs the
AUX ring buffer when taking snapshot.

Patch 02 is to polish code, it removes the redundant header maintained
in tmc-etr driver and directly uses pointer perf_output_handle::head.

Patch 03 removes the callback cs_etm_find_snapshot() which wrongly
calculates the buffer headers; we can simply use the perf's common
function __auxtrace_mmap__read() for headers calculation.

This patch can be cleanly applied on the mainline kernel with:

  commit 97e5bf604b7a ("Merge branch 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu")

And it has been tested on Arm64 Juno board.


Leo Yan (3):
  coresight: etm-perf: Correct buffer syncing for snapshot
  coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
  perf cs-etm: Remove callback cs_etm_find_snapshot()

 .../hwtracing/coresight/coresight-etm-perf.c  |  30 +++-
 .../hwtracing/coresight/coresight-etm-perf.h  |   2 +
 .../hwtracing/coresight/coresight-tmc-etr.c   |  10 +-
 tools/perf/arch/arm/util/cs-etm.c             | 133 ------------------
 4 files changed, 32 insertions(+), 143 deletions(-)

-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot
  2021-05-28 16:15 [PATCH v1 0/3] coresight: Fix for snapshot mode Leo Yan
@ 2021-05-28 16:15 ` Leo Yan
  2021-06-01  9:53   ` James Clark
  2021-06-08 21:41   ` Mathieu Poirier
  2021-05-28 16:15 ` [PATCH v1 2/3] coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer Leo Yan
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 21+ messages in thread
From: Leo Yan @ 2021-05-28 16:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Denis Nikitin, coresight, linux-arm-kernel,
	linux-kernel, linux-perf-users
  Cc: Leo Yan

The perf tool records the Arm CoreSight trace data with snapshot mode
with the option '-S', when receiving USR2 signal, it is observed the
captured trace data size is very varied: from several MBs to ~20MBs.
This can be reproduced with the command:

  perf record -e cs_etm// -S \
	-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
  PERFPID=$!
  sleep 1
  kill -USR2 $PERFPID

It's different for only specifying option '-S' than options '-a -S'.  If
without option '-a', perf tool creates separate AUX buffers for every
CPU, but the tracer will be enabled only when the profiled program is
scheduled onto the corresponding CPU, this might lead to record very
old trace data when snapshot.

Let's see below diagram:
                                                            snapshot
  CPU0: ______###P1###__________________________________________|
  CPU1: __________________________###P3###____________###P5###__|
  CPU2: ____________________________________###P4###____________|
  CPU3: ________________###P2###________________________________V

In this diagram, the program runs for 5 periods (from P1 to P5), these 5
periods show the task run on different CPUs, e.g. during P1 period the
program runs on CPU0, and during P2 period the program is migrated to
CPU1, and so on.  At the end of P1 period when the program is switched
out from CPU0, the ETR trace data is saved into AUX trace buffer, this
AUX buffer is a dedicated buffer for CPU0's tracer.  With the same
logic, P2's trace data is saved into CPU3's tracer buffer, P4's trace
data is saved into CPU2's buffer, P3 and P5's trace data is saved into
CPU1's.  Therefore, when snapshot, it saves the trace data from all AUX
ring buffers (in this case, it have total 4 AUX ring buffers) into perf
data file.

This is why we can see varied trace data size, it's quite dependent on
the task scheduling on CPUs, if the task is spinned to only one CPU and
without scheduling out, it will only record trace data from only one
AUX trace buffer.  If the task is frequently scheduled in and out, then
it gives more chance to fill trace data into the AUX buffer.

In this example, it also causes the discontinuous trace data.  If P3's
trace data is lost after P5's trace data overwrites the AUX trace data,
thus perf tool fails to record continuous trace data if only have
trace data for P1/P2/P4/P5.

For snapshot mode, usually the user only wants to capture the trace data
for the specific time point and prior to the that point the tracer
should work with free run mode.  This means it's not necessary to
capture trace data for task's scheduling in and out until the perf tool
explicitly disables tracers for snapshot.  This can be fulfilled by
checking the variable "event->ctx->is_active", when the task is
scheduled out this variable is set to zero, and when snapshot this
variable is still non-zero value.  So the driver can only record trace
data only when "event->ctx->is_active" is non-zero.

After applying this change, the perf tool can record the consistent
trace data size for snapshot.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 .../hwtracing/coresight/coresight-etm-perf.c  | 30 +++++++++++++++++--
 .../hwtracing/coresight/coresight-etm-perf.h  |  2 ++
 2 files changed, 29 insertions(+), 3 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 6f398377fec9..fd36d0530087 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -284,6 +284,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
 	if (!event_data)
 		return NULL;
 	INIT_WORK(&event_data->work, free_event_data);
+	event_data->overwrite = overwrite;
 
 	/* First get the selected sink from user space. */
 	if (event->attr.config2) {
@@ -517,9 +518,32 @@ static void etm_event_stop(struct perf_event *event, int mode)
 		if (!sink_ops(sink)->update_buffer)
 			return;
 
-		size = sink_ops(sink)->update_buffer(sink, handle,
-					      event_data->snk_config);
-		perf_aux_output_end(handle, size);
+		/*
+		 * In the snapshot mode, if only specifies option '-S' (note,
+		 * user doesn't specify option '-a' in this case), the AUX ring
+		 * buffers are allocated for every CPU but the AUX trace is
+		 * recorded in per-thread mode.  In this mode, it's needless to
+		 * save AUX trace data into the AUX ring buffer when the
+		 * profiled program is scheduled out from a CPU, alternatively,
+		 * the driver should only capture AUX trace data when the perf
+		 * tool receives USR2 signal for taking snapshot.
+		 *
+		 * The variable "event->ctx->is_active" can be used to
+		 * distinguish the cases between the program scheduling out and
+		 * snapshot: its value is zero when the profiled task scheduled
+		 * out, and it is a non-zero value when perf tool invokes ioctl
+		 * PERF_EVENT_IOC_DISABLE for snapshot.
+		 *
+		 * Only updates AUX ring buffer for snapshot, or for the perf
+		 * session which is not in snapshot mode.
+		 */
+		if (!event_data->overwrite || event->ctx->is_active) {
+			size = sink_ops(sink)->update_buffer(sink, handle,
+						      event_data->snk_config);
+			perf_aux_output_end(handle, size);
+		} else {
+			perf_aux_output_end(handle, 0);
+		}
 	}
 
 	/* Disabling the path make its elements available to other sessions */
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h
index 3e4f2ad5e193..2cc3af05495f 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.h
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.h
@@ -49,12 +49,14 @@ struct etm_filters {
  * @mask:		Hold the CPU(s) this event was set for.
  * @snk_config:		The sink configuration.
  * @path:		An array of path, each slot for one CPU.
+ * @overwrite:		Flag for snapshot mode.
  */
 struct etm_event_data {
 	struct work_struct work;
 	cpumask_t mask;
 	void *snk_config;
 	struct list_head * __percpu *path;
+	bool overwrite;
 };
 
 #if IS_ENABLED(CONFIG_CORESIGHT)
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v1 2/3] coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
  2021-05-28 16:15 [PATCH v1 0/3] coresight: Fix for snapshot mode Leo Yan
  2021-05-28 16:15 ` [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot Leo Yan
@ 2021-05-28 16:15 ` Leo Yan
  2021-06-10 15:38   ` Suzuki K Poulose
  2021-05-28 16:15 ` [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot() Leo Yan
  2021-06-10  6:43 ` [PATCH v1 0/3] coresight: Fix for snapshot mode Denis Nikitin
  3 siblings, 1 reply; 21+ messages in thread
From: Leo Yan @ 2021-05-28 16:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Denis Nikitin, coresight, linux-arm-kernel,
	linux-kernel, linux-perf-users
  Cc: Leo Yan

When enable the Arm CoreSight PMU event, the context for AUX ring buffer
is prepared in the structure perf_output_handle, and its field "head"
points the head of the AUX ring buffer and it is updated after filling
AUX trace data into buffer.

Current code uses an extra field etr_perf_buffer::head to maintain the
header for the AUX ring buffer, thus it's not necessary and it's better
to directly perf_output_handle::head.

This patch removes the header etr_perf_buffer::head and directly used
perf_output_handle::head as the header for AUX ring buffer.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 10 +++-------
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index acdb59e0e661..b22823d67680 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -32,7 +32,6 @@ struct etr_flat_buf {
  * @etr_buf		- Actual buffer used by the ETR
  * @pid			- The PID this etr_perf_buffer belongs to.
  * @snaphost		- Perf session mode
- * @head		- handle->head at the beginning of the session.
  * @nr_pages		- Number of pages in the ring buffer.
  * @pages		- Array of Pages in the ring buffer.
  */
@@ -41,7 +40,6 @@ struct etr_perf_buffer {
 	struct etr_buf		*etr_buf;
 	pid_t			pid;
 	bool			snapshot;
-	unsigned long		head;
 	int			nr_pages;
 	void			**pages;
 };
@@ -1437,16 +1435,16 @@ static void tmc_free_etr_buffer(void *config)
  * buffer to the perf ring buffer.
  */
 static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
+				     unsigned long head,
 				     unsigned long src_offset,
 				     unsigned long to_copy)
 {
 	long bytes;
 	long pg_idx, pg_offset;
-	unsigned long head = etr_perf->head;
 	char **dst_pages, *src_buf;
 	struct etr_buf *etr_buf = etr_perf->etr_buf;
 
-	head = etr_perf->head;
+	head = PERF_IDX2OFF(head, etr_perf);
 	pg_idx = head >> PAGE_SHIFT;
 	pg_offset = head & (PAGE_SIZE - 1);
 	dst_pages = (char **)etr_perf->pages;
@@ -1553,7 +1551,7 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
 	/* Insert barrier packets at the beginning, if there was an overflow */
 	if (lost)
 		tmc_etr_buf_insert_barrier_packet(etr_buf, offset);
-	tmc_etr_sync_perf_buffer(etr_perf, offset, size);
+	tmc_etr_sync_perf_buffer(etr_perf, handle->head, offset, size);
 
 	/*
 	 * In snapshot mode we simply increment the head by the number of byte
@@ -1605,8 +1603,6 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
 		goto unlock_out;
 	}
 
-	etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf);
-
 	/*
 	 * No HW configuration is needed if the sink is already in
 	 * use for this session.
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot()
  2021-05-28 16:15 [PATCH v1 0/3] coresight: Fix for snapshot mode Leo Yan
  2021-05-28 16:15 ` [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot Leo Yan
  2021-05-28 16:15 ` [PATCH v1 2/3] coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer Leo Yan
@ 2021-05-28 16:15 ` Leo Yan
  2021-06-22 14:29   ` James Clark
  2021-06-24 16:46   ` James Clark
  2021-06-10  6:43 ` [PATCH v1 0/3] coresight: Fix for snapshot mode Denis Nikitin
  3 siblings, 2 replies; 21+ messages in thread
From: Leo Yan @ 2021-05-28 16:15 UTC (permalink / raw)
  To: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Denis Nikitin, coresight, linux-arm-kernel,
	linux-kernel, linux-perf-users
  Cc: Leo Yan

The callback cs_etm_find_snapshot() is invoked for snapshot mode, its
main purpose is to find the correct AUX trace data and returns "head"
and "old" (we can call "old" as "old head") to the caller, the caller
__auxtrace_mmap__read() uses these two pointers to decide the AUX trace
data size.

cs_etm_find_snapshot() should be removed with below reasons:

- The first thing in cs_etm_find_snapshot() is to check if the head has
  wrapped around, if it is not, directly bails out.  The checking is
  pointless, this is because the "head" and "old" pointers both are
  monotonical increasing so they never wrap around.

- cs_etm_find_snapshot() adjusts the "head" and "old" pointers and
  assumes the AUX ring buffer is fully filled with the hardware trace
  data, so it always calculates the difference "mm->len" between "head"
  and "old".  Let's imagine the snapshot is taken in very short
  interval, the tracers only fill a small chunk of the trace data into
  the AUX ring buffer, in this case, it's wrongly to copy the whole the
  AUX ring buffer to perf file.

- As the "head" and "old" pointers are monotonically increased, the
  function __auxtrace_mmap__read() handles these two pointers properly.
  It calculates the reminders for these two pointers, and the size is
  clamped to be never more than "snapshot_size".  We can simply reply on
  the function __auxtrace_mmap__read() to calculate the correct result
  for data copying, it's not necessary to add Arm CoreSight specific
  callback.

Signed-off-by: Leo Yan <leo.yan@linaro.org>
---
 tools/perf/arch/arm/util/cs-etm.c | 133 ------------------------------
 1 file changed, 133 deletions(-)

diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index d942f118d32c..85168d87b2d7 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -38,8 +38,6 @@ struct cs_etm_recording {
 	struct auxtrace_record	itr;
 	struct perf_pmu		*cs_etm_pmu;
 	struct evlist		*evlist;
-	int			wrapped_cnt;
-	bool			*wrapped;
 	bool			snapshot_mode;
 	size_t			snapshot_size;
 };
@@ -734,135 +732,6 @@ static int cs_etm_info_fill(struct auxtrace_record *itr,
 	return 0;
 }
 
-static int cs_etm_alloc_wrapped_array(struct cs_etm_recording *ptr, int idx)
-{
-	bool *wrapped;
-	int cnt = ptr->wrapped_cnt;
-
-	/* Make @ptr->wrapped as big as @idx */
-	while (cnt <= idx)
-		cnt++;
-
-	/*
-	 * Free'ed in cs_etm_recording_free().  Using realloc() to avoid
-	 * cross compilation problems where the host's system supports
-	 * reallocarray() but not the target.
-	 */
-	wrapped = realloc(ptr->wrapped, cnt * sizeof(bool));
-	if (!wrapped)
-		return -ENOMEM;
-
-	wrapped[cnt - 1] = false;
-	ptr->wrapped_cnt = cnt;
-	ptr->wrapped = wrapped;
-
-	return 0;
-}
-
-static bool cs_etm_buffer_has_wrapped(unsigned char *buffer,
-				      size_t buffer_size, u64 head)
-{
-	u64 i, watermark;
-	u64 *buf = (u64 *)buffer;
-	size_t buf_size = buffer_size;
-
-	/*
-	 * We want to look the very last 512 byte (chosen arbitrarily) in
-	 * the ring buffer.
-	 */
-	watermark = buf_size - 512;
-
-	/*
-	 * @head is continuously increasing - if its value is equal or greater
-	 * than the size of the ring buffer, it has wrapped around.
-	 */
-	if (head >= buffer_size)
-		return true;
-
-	/*
-	 * The value of @head is somewhere within the size of the ring buffer.
-	 * This can be that there hasn't been enough data to fill the ring
-	 * buffer yet or the trace time was so long that @head has numerically
-	 * wrapped around.  To find we need to check if we have data at the very
-	 * end of the ring buffer.  We can reliably do this because mmap'ed
-	 * pages are zeroed out and there is a fresh mapping with every new
-	 * session.
-	 */
-
-	/* @head is less than 512 byte from the end of the ring buffer */
-	if (head > watermark)
-		watermark = head;
-
-	/*
-	 * Speed things up by using 64 bit transactions (see "u64 *buf" above)
-	 */
-	watermark >>= 3;
-	buf_size >>= 3;
-
-	/*
-	 * If we find trace data at the end of the ring buffer, @head has
-	 * been there and has numerically wrapped around at least once.
-	 */
-	for (i = watermark; i < buf_size; i++)
-		if (buf[i])
-			return true;
-
-	return false;
-}
-
-static int cs_etm_find_snapshot(struct auxtrace_record *itr,
-				int idx, struct auxtrace_mmap *mm,
-				unsigned char *data,
-				u64 *head, u64 *old)
-{
-	int err;
-	bool wrapped;
-	struct cs_etm_recording *ptr =
-			container_of(itr, struct cs_etm_recording, itr);
-
-	/*
-	 * Allocate memory to keep track of wrapping if this is the first
-	 * time we deal with this *mm.
-	 */
-	if (idx >= ptr->wrapped_cnt) {
-		err = cs_etm_alloc_wrapped_array(ptr, idx);
-		if (err)
-			return err;
-	}
-
-	/*
-	 * Check to see if *head has wrapped around.  If it hasn't only the
-	 * amount of data between *head and *old is snapshot'ed to avoid
-	 * bloating the perf.data file with zeros.  But as soon as *head has
-	 * wrapped around the entire size of the AUX ring buffer it taken.
-	 */
-	wrapped = ptr->wrapped[idx];
-	if (!wrapped && cs_etm_buffer_has_wrapped(data, mm->len, *head)) {
-		wrapped = true;
-		ptr->wrapped[idx] = true;
-	}
-
-	pr_debug3("%s: mmap index %d old head %zu new head %zu size %zu\n",
-		  __func__, idx, (size_t)*old, (size_t)*head, mm->len);
-
-	/* No wrap has occurred, we can just use *head and *old. */
-	if (!wrapped)
-		return 0;
-
-	/*
-	 * *head has wrapped around - adjust *head and *old to pickup the
-	 * entire content of the AUX buffer.
-	 */
-	if (*head >= mm->len) {
-		*old = *head - mm->len;
-	} else {
-		*head += mm->len;
-		*old = *head - mm->len;
-	}
-
-	return 0;
-}
-
 static int cs_etm_snapshot_start(struct auxtrace_record *itr)
 {
 	struct cs_etm_recording *ptr =
@@ -900,7 +769,6 @@ static void cs_etm_recording_free(struct auxtrace_record *itr)
 	struct cs_etm_recording *ptr =
 			container_of(itr, struct cs_etm_recording, itr);
 
-	zfree(&ptr->wrapped);
 	free(ptr);
 }
 
@@ -928,7 +796,6 @@ struct auxtrace_record *cs_etm_record_init(int *err)
 	ptr->itr.recording_options	= cs_etm_recording_options;
 	ptr->itr.info_priv_size		= cs_etm_info_priv_size;
 	ptr->itr.info_fill		= cs_etm_info_fill;
-	ptr->itr.find_snapshot		= cs_etm_find_snapshot;
 	ptr->itr.snapshot_start		= cs_etm_snapshot_start;
 	ptr->itr.snapshot_finish	= cs_etm_snapshot_finish;
 	ptr->itr.reference		= cs_etm_reference;
-- 
2.25.1


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot
  2021-05-28 16:15 ` [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot Leo Yan
@ 2021-06-01  9:53   ` James Clark
  2021-06-01 10:35     ` Leo Yan
  2021-06-08 21:41   ` Mathieu Poirier
  1 sibling, 1 reply; 21+ messages in thread
From: James Clark @ 2021-06-01  9:53 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Mathieu Poirier,
	Suzuki K Poulose, Mike Leach, Alexander Shishkin, John Garry,
	Will Deacon, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
	Namhyung Kim, Daniel Kiss, Denis Nikitin, coresight,
	linux-arm-kernel, linux-kernel, linux-perf-users



On 28/05/2021 19:15, Leo Yan wrote:
> The perf tool records the Arm CoreSight trace data with snapshot mode
> with the option '-S', when receiving USR2 signal, it is observed the
> captured trace data size is very varied: from several MBs to ~20MBs.
> This can be reproduced with the command:
> 
>   perf record -e cs_etm// -S \
> 	-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
>   PERFPID=$!
>   sleep 1
>   kill -USR2 $PERFPID
> 
> It's different for only specifying option '-S' than options '-a -S'.  If
> without option '-a', perf tool creates separate AUX buffers for every
> CPU, but the tracer will be enabled only when the profiled program is
> scheduled onto the corresponding CPU, this might lead to record very
> old trace data when snapshot.
> 
> Let's see below diagram:
>                                                             snapshot
>   CPU0: ______###P1###__________________________________________|
>   CPU1: __________________________###P3###____________###P5###__|
>   CPU2: ____________________________________###P4###____________|
>   CPU3: ________________###P2###________________________________V
> 
> In this diagram, the program runs for 5 periods (from P1 to P5), these 5
> periods show the task run on different CPUs, e.g. during P1 period the
> program runs on CPU0, and during P2 period the program is migrated to
> CPU1, and so on.  At the end of P1 period when the program is switched
> out from CPU0, the ETR trace data is saved into AUX trace buffer, this
> AUX buffer is a dedicated buffer for CPU0's tracer.  With the same
> logic, P2's trace data is saved into CPU3's tracer buffer, P4's trace
> data is saved into CPU2's buffer, P3 and P5's trace data is saved into
> CPU1's.  Therefore, when snapshot, it saves the trace data from all AUX
> ring buffers (in this case, it have total 4 AUX ring buffers) into perf
> data file.

Hi Leo,

I was testing out snapshot mode (without your patch) and I noticed that it
only ever collects from the last CPU. For example on a 4 core system,
the CPU ID of the AUX records and the AUXTRACE buffers is always 3.

This is with systemwide tracing, and running "stress -m 2 -c 2".
Is this something that your patch fixes, or am I doing something wrong, or
is it just a coincidence?

Here's a snippet of the output:

	./perf report -D | grep AUX
	0 0 0x200 [0x168]: PERF_RECORD_AUXTRACE_INFO type: 3
	0 0 0x152248 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0  ref: 0x75e0bdc44ea1bb65  idx: 3  tid: -1  cpu: 3
	3 583600975364460 0x152160 [0x40]: PERF_RECORD_AUX offset: 0x400000 size: 0x400000 flags: 0x2 [O]
	0 0 0x55c950 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x400000  ref: 0x6f506d2d02841da4  idx: 3  tid: -1  cpu: 3
	3 583602209157460 0x55c908 [0x40]: PERF_RECORD_AUX offset: 0x800000 size: 0x400000 flags: 0x2 [O]
	0 0 0x9624d8 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x800000  ref: 0x2d83d30161e1117a  idx: 3  tid: -1  cpu: 3
	3 583602526365800 0x962490 [0x40]: PERF_RECORD_AUX offset: 0xc00000 size: 0x400000 flags: 0x2 [O]
	0 0 0xd65f00 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0xc00000  ref: 0x5013e6e2a3c97c9  idx: 3  tid: -1  cpu: 3
	3 583602714310320 0xd65eb8 [0x40]: PERF_RECORD_AUX offset: 0x1000000 size: 0x400000 flags: 0x2 [O]
	0 0 0x1169be8 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x1000000  ref: 0x67b37e157f843269  idx: 3  tid: -1  cpu: 3
	3 583602874193840 0x1169ba0 [0x40]: PERF_RECORD_AUX offset: 0x1400000 size: 0x400000 flags: 0x2 [O]
	0 0 0x156d550 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x1400000  ref: 0x3cb268b926f22d41  idx: 3  tid: -1  cpu: 3
	3 583603044203980 0x156d508 [0x40]: PERF_RECORD_AUX offset: 0x1800000 size: 0x400000 flags: 0x2 [O]
	0 0 0x1971238 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x1800000  ref: 0x4905e0a21d5d35d7  idx: 3  tid: -1  cpu: 3
	3 583603211393440 0x19711f0 [0x40]: PERF_RECORD_AUX offset: 0x1c00000 size: 0x400000 flags: 0x2 [O]
	0 0 0x1d747e0 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x1c00000  ref: 0x4f8f48007f7d70e7  idx: 3  tid: -1  cpu: 3
	3 583603362643100 0x1d74798 [0x40]: PERF_RECORD_AUX offset: 0x2000000 size: 0x400000 flags: 0x2 [O]
	0 0 0x2178368 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x2000000  ref: 0x770d07213be2d29a  idx: 3  tid: -1  cpu: 3
	3 583603526029900 0x2178320 [0x40]: PERF_RECORD_AUX offset: 0x2400000 size: 0x400000 flags: 0x2 [O]
	0 0 0x257bfb0 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x2400000  ref: 0x2e4ed1454815c13e  idx: 3  tid: -1  cpu: 3
	3 583603687951260 0x257bf68 [0x40]: PERF_RECORD_AUX offset: 0x2800000 size: 0x400000 flags: 0x2 [O]
	0 0 0x297fb18 [0x30]: PERF_RECORD_AUXTRACE size: 0x400000  offset: 0x2800000  ref: 0x644eba01d391129  idx: 3  tid: -1  cpu: 3

Thanks
James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot
  2021-06-01  9:53   ` James Clark
@ 2021-06-01 10:35     ` Leo Yan
  2021-06-10 14:54       ` James Clark
  0 siblings, 1 reply; 21+ messages in thread
From: Leo Yan @ 2021-06-01 10:35 UTC (permalink / raw)
  To: James Clark
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Denis Nikitin, coresight, linux-arm-kernel,
	linux-kernel, linux-perf-users

Hi James,

On Tue, Jun 01, 2021 at 12:53:16PM +0300, James Clark wrote:

[...]

> Hi Leo,
> 
> I was testing out snapshot mode (without your patch) and I noticed that it
> only ever collects from the last CPU. For example on a 4 core system,
> the CPU ID of the AUX records and the AUXTRACE buffers is always 3.
> 
> This is with systemwide tracing, and running "stress -m 2 -c 2".
> Is this something that your patch fixes, or am I doing something wrong, or
> is it just a coincidence?

No, I think it's quite likely caused by blow code:

static unsigned long
tmc_update_etr_buffer(struct coresight_device *csdev,
                      struct perf_output_handle *handle,
                      void *config)
{
    unsigned long flags, offset, size = 0;

    ...

    /* Don't do anything if another tracer is using this sink */
    if (atomic_read(csdev->refcnt) != 1) {
        spin_unlock_irqrestore(&drvdata->spinlock, flags);
        goto out;
    }

    ...

    return size;
}

When using the system wide tracing, it updates the AUX ring buffer
until the last tracer is stopped.  Thus whis is why it only records
AUX ring buffer for the last CPU.

But this makes sense for me, this is because the last CPU is used to
copy trace data to AUX ring buffer (so the perf event PERF_RECORD_AUX
occurs on CPU3), but when you decode the trace data, you should can
see the activities from other CPUs.

Thanks,
Leo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot
  2021-05-28 16:15 ` [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot Leo Yan
  2021-06-01  9:53   ` James Clark
@ 2021-06-08 21:41   ` Mathieu Poirier
  2021-06-09  1:35     ` Leo Yan
  1 sibling, 1 reply; 21+ messages in thread
From: Mathieu Poirier @ 2021-06-08 21:41 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mike Leach,
	Alexander Shishkin, John Garry, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Jiri Olsa, Namhyung Kim, Daniel Kiss, Denis Nikitin,
	coresight, linux-arm-kernel, linux-kernel, linux-perf-users

On Sat, May 29, 2021 at 12:15:50AM +0800, Leo Yan wrote:
> The perf tool records the Arm CoreSight trace data with snapshot mode
> with the option '-S', when receiving USR2 signal, it is observed the
> captured trace data size is very varied: from several MBs to ~20MBs.
> This can be reproduced with the command:
> 
>   perf record -e cs_etm// -S \
> 	-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
>   PERFPID=$!
>   sleep 1
>   kill -USR2 $PERFPID
> 
> It's different for only specifying option '-S' than options '-a -S'.  If
> without option '-a', perf tool creates separate AUX buffers for every
> CPU, but the tracer will be enabled only when the profiled program is
> scheduled onto the corresponding CPU, this might lead to record very
> old trace data when snapshot.

I'm very perplexed by this paragraph...  The '-a' option means tracing on all
CPUs, which will definitely allocate an AUX buffer per CPU.

> 
> Let's see below diagram:
>                                                             snapshot
>   CPU0: ______###P1###__________________________________________|
>   CPU1: __________________________###P3###____________###P5###__|
>   CPU2: ____________________________________###P4###____________|
>   CPU3: ________________###P2###________________________________V
> 
> In this diagram, the program runs for 5 periods (from P1 to P5), these 5
> periods show the task run on different CPUs, e.g. during P1 period the
> program runs on CPU0, and during P2 period the program is migrated to
> CPU1, and so on.  At the end of P1 period when the program is switched
> out from CPU0, the ETR trace data is saved into AUX trace buffer, this
> AUX buffer is a dedicated buffer for CPU0's tracer.  With the same
> logic, P2's trace data is saved into CPU3's tracer buffer, P4's trace
> data is saved into CPU2's buffer, P3 and P5's trace data is saved into
> CPU1's.  Therefore, when snapshot, it saves the trace data from all AUX
> ring buffers (in this case, it have total 4 AUX ring buffers) into perf
> data file.
>

When a snapshot happens and if the event is currently active, traces
accumulated since the event was _installed_ on the CPU will be copied from the
coresight buffer to the AUX buffer.  Trace data accumulated in other buffers will
already have been copied when the event was scheduled out.

I *think* what is happening here is that if P5 was still active when the
snapshot occurs, we would get trace data from P3 and P5, and not P4.  Is my
understanding correct?

> This is why we can see varied trace data size, it's quite dependent on
> the task scheduling on CPUs, if the task is spinned to only one CPU and
> without scheduling out, it will only record trace data from only one
> AUX trace buffer.  If the task is frequently scheduled in and out, then
> it gives more chance to fill trace data into the AUX buffer.
> 
> In this example, it also causes the discontinuous trace data.  If P3's
> trace data is lost after P5's trace data overwrites the AUX trace data,
> thus perf tool fails to record continuous trace data if only have
> trace data for P1/P2/P4/P5.
> 
> For snapshot mode, usually the user only wants to capture the trace data
> for the specific time point and prior to the that point the tracer
> should work with free run mode.  This means it's not necessary to
> capture trace data for task's scheduling in and out until the perf tool
> explicitly disables tracers for snapshot.  This can be fulfilled by
> checking the variable "event->ctx->is_active", when the task is
> scheduled out this variable is set to zero, and when snapshot this
> variable is still non-zero value.  So the driver can only record trace
> data only when "event->ctx->is_active" is non-zero.

The problem with this approach is that if the AUX buffer is 4MB and the CS
buffer is 1MB then we only record 1MB, but because of N:1 topologies we may not
be able to do better.  I haven't looked at the code yet, something I will do
tomorrow.

> 
> After applying this change, the perf tool can record the consistent
> trace data size for snapshot.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>  .../hwtracing/coresight/coresight-etm-perf.c  | 30 +++++++++++++++++--
>  .../hwtracing/coresight/coresight-etm-perf.h  |  2 ++
>  2 files changed, 29 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 6f398377fec9..fd36d0530087 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -284,6 +284,7 @@ static void *etm_setup_aux(struct perf_event *event, void **pages,
>  	if (!event_data)
>  		return NULL;
>  	INIT_WORK(&event_data->work, free_event_data);
> +	event_data->overwrite = overwrite;
>  
>  	/* First get the selected sink from user space. */
>  	if (event->attr.config2) {
> @@ -517,9 +518,32 @@ static void etm_event_stop(struct perf_event *event, int mode)
>  		if (!sink_ops(sink)->update_buffer)
>  			return;
>  
> -		size = sink_ops(sink)->update_buffer(sink, handle,
> -					      event_data->snk_config);
> -		perf_aux_output_end(handle, size);
> +		/*
> +		 * In the snapshot mode, if only specifies option '-S' (note,
> +		 * user doesn't specify option '-a' in this case), the AUX ring
> +		 * buffers are allocated for every CPU but the AUX trace is
> +		 * recorded in per-thread mode.  In this mode, it's needless to
> +		 * save AUX trace data into the AUX ring buffer when the
> +		 * profiled program is scheduled out from a CPU, alternatively,
> +		 * the driver should only capture AUX trace data when the perf
> +		 * tool receives USR2 signal for taking snapshot.
> +		 *
> +		 * The variable "event->ctx->is_active" can be used to
> +		 * distinguish the cases between the program scheduling out and
> +		 * snapshot: its value is zero when the profiled task scheduled
> +		 * out, and it is a non-zero value when perf tool invokes ioctl
> +		 * PERF_EVENT_IOC_DISABLE for snapshot.
> +		 *
> +		 * Only updates AUX ring buffer for snapshot, or for the perf
> +		 * session which is not in snapshot mode.
> +		 */
> +		if (!event_data->overwrite || event->ctx->is_active) {
> +			size = sink_ops(sink)->update_buffer(sink, handle,
> +						      event_data->snk_config);
> +			perf_aux_output_end(handle, size);
> +		} else {
> +			perf_aux_output_end(handle, 0);
> +		}
>  	}
>  
>  	/* Disabling the path make its elements available to other sessions */
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h
> index 3e4f2ad5e193..2cc3af05495f 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.h
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.h
> @@ -49,12 +49,14 @@ struct etm_filters {
>   * @mask:		Hold the CPU(s) this event was set for.
>   * @snk_config:		The sink configuration.
>   * @path:		An array of path, each slot for one CPU.
> + * @overwrite:		Flag for snapshot mode.
>   */
>  struct etm_event_data {
>  	struct work_struct work;
>  	cpumask_t mask;
>  	void *snk_config;
>  	struct list_head * __percpu *path;
> +	bool overwrite;
>  };
>  
>  #if IS_ENABLED(CONFIG_CORESIGHT)
> -- 
> 2.25.1
> 

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot
  2021-06-08 21:41   ` Mathieu Poirier
@ 2021-06-09  1:35     ` Leo Yan
  2021-06-09  1:39       ` Leo Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Leo Yan @ 2021-06-09  1:35 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mike Leach,
	Alexander Shishkin, John Garry, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Jiri Olsa, Namhyung Kim, Daniel Kiss, Denis Nikitin,
	coresight, linux-arm-kernel, linux-kernel, linux-perf-users

On Tue, Jun 08, 2021 at 03:41:49PM -0600, Mathieu Poirier wrote:
> On Sat, May 29, 2021 at 12:15:50AM +0800, Leo Yan wrote:
> > The perf tool records the Arm CoreSight trace data with snapshot mode
> > with the option '-S', when receiving USR2 signal, it is observed the
> > captured trace data size is very varied: from several MBs to ~20MBs.
> > This can be reproduced with the command:
> > 
> >   perf record -e cs_etm// -S \
> > 	-- dd if=/dev/zero of=/dev/null > /dev/null 2>&1 &
> >   PERFPID=$!
> >   sleep 1
> >   kill -USR2 $PERFPID
> > 
> > It's different for only specifying option '-S' than options '-a -S'.  If
> > without option '-a', perf tool creates separate AUX buffers for every
> > CPU, but the tracer will be enabled only when the profiled program is
> > scheduled onto the corresponding CPU, this might lead to record very
> > old trace data when snapshot.
> 
> I'm very perplexed by this paragraph...  The '-a' option means tracing on all
> CPUs, which will definitely allocate an AUX buffer per CPU.

Sorry for confusion.  We can understand the mechanism based on two factors:
one factor is AUX buffer allocation (per CPU or single AUX buffer
shared by all CPUs), another factor is what's the occasion to enable tracers.

If with only option '-a -S', as you said, perf allocates an AUX buffer per CPU.
And it always enable ETMs for all CPUs.

If with option '-S', perf also allocates an AUX buffer per CPU, but
it's different for enabling tracers from the option '-a -S';  in this
case, perf works like per-thread mode, the tracer (ETM) is only enabled
when the profiled process is scheduled on a specific CPU.  e.g. when
the task is scheduled on CPU2, then the ETM attached to CPU2 is
enabled for tracing.

> > Let's see below diagram:
> >                                                             snapshot
> >   CPU0: ______###P1###__________________________________________|
> >   CPU1: __________________________###P3###____________###P5###__|
> >   CPU2: ____________________________________###P4###____________|
> >   CPU3: ________________###P2###________________________________V
> > 
> > In this diagram, the program runs for 5 periods (from P1 to P5), these 5
> > periods show the task run on different CPUs, e.g. during P1 period the
> > program runs on CPU0, and during P2 period the program is migrated to
> > CPU1, and so on.  At the end of P1 period when the program is switched
> > out from CPU0, the ETR trace data is saved into AUX trace buffer, this
> > AUX buffer is a dedicated buffer for CPU0's tracer.  With the same
> > logic, P2's trace data is saved into CPU3's tracer buffer, P4's trace
> > data is saved into CPU2's buffer, P3 and P5's trace data is saved into
> > CPU1's.  Therefore, when snapshot, it saves the trace data from all AUX
> > ring buffers (in this case, it have total 4 AUX ring buffers) into perf
> > data file.
> >
> 
> When a snapshot happens and if the event is currently active, traces
> accumulated since the event was _installed_ on the CPU will be copied from the
> coresight buffer to the AUX buffer.  Trace data accumulated in other buffers will
> already have been copied when the event was scheduled out.

Yes.

> I *think* what is happening here is that if P5 was still active when the
> snapshot occurs, we would get trace data from P3 and P5, and not P4.  Is my
> understanding correct?

No, in current code, when the snapshot occurs, we get trace data from
P1 to P5.

Keep in mind that the AUX ring buffer and CoreSight's bounce buffer are
allocated per CPU.  So P1 trace data is stored into AUX buffer 0, P3 and
P5 trace data is stored in AUX buffer 1, and so on.

When the snapshot is taken, it iterates the AUX buffers one bye one
from buffer 0 to buffer 3, see the function
record__auxtrace_read_snapshot_all() in the perf source file builtin-record.c.
Since every AUX buffer contains the trace data, so P1 to P5's trace
data will be recorded into perf data file.

> > This is why we can see varied trace data size, it's quite dependent on
> > the task scheduling on CPUs, if the task is spinned to only one CPU and
> > without scheduling out, it will only record trace data from only one
> > AUX trace buffer.  If the task is frequently scheduled in and out, then
> > it gives more chance to fill trace data into the AUX buffer.
> > 
> > In this example, it also causes the discontinuous trace data.  If P3's
> > trace data is lost after P5's trace data overwrites the AUX trace data,
> > thus perf tool fails to record continuous trace data if only have
> > trace data for P1/P2/P4/P5.
> > 
> > For snapshot mode, usually the user only wants to capture the trace data
> > for the specific time point and prior to the that point the tracer
> > should work with free run mode.  This means it's not necessary to
> > capture trace data for task's scheduling in and out until the perf tool
> > explicitly disables tracers for snapshot.  This can be fulfilled by
> > checking the variable "event->ctx->is_active", when the task is
> > scheduled out this variable is set to zero, and when snapshot this
> > variable is still non-zero value.  So the driver can only record trace
> > data only when "event->ctx->is_active" is non-zero.
> 
> The problem with this approach is that if the AUX buffer is 4MB and the CS
> buffer is 1MB then we only record 1MB, but because of N:1 topologies we may not
> be able to do better.  I haven't looked at the code yet, something I will do
> tomorrow.

Yes, the fixing in this patch copies the trace data is limited by the
CoreSight bounce buffer length.  So if CoreSight buffer length is
1MiB, then we have no chance to allocate 4MiB for AUX buffer.

Seems to me, it's unlikely that we have the AUX buffer with 4MiB and
CoreSight bounce buffer with 1MiB.  I read the function
tmc_alloc_etr_buffer(), in most case, the AUX buffer size and
CoreSight bounce buffer size is consistent.

If have any furter questions, just let me know.  Thanks for reviewing.

Leo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot
  2021-06-09  1:35     ` Leo Yan
@ 2021-06-09  1:39       ` Leo Yan
  0 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2021-06-09  1:39 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Arnaldo Carvalho de Melo, Suzuki K Poulose, Mike Leach,
	Alexander Shishkin, John Garry, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Jiri Olsa, Namhyung Kim, Daniel Kiss, Denis Nikitin,
	coresight, linux-arm-kernel, linux-kernel, linux-perf-users

On Wed, Jun 09, 2021 at 09:35:55AM +0800, Leo Yan wrote:

[...]

> > The problem with this approach is that if the AUX buffer is 4MB and the CS
> > buffer is 1MB then we only record 1MB, but because of N:1 topologies we may not
> > be able to do better.  I haven't looked at the code yet, something I will do
> > tomorrow.
> 
> Yes, the fixing in this patch copies the trace data is limited by the
> CoreSight bounce buffer length.  So if CoreSight buffer length is
> 1MiB, then we have no chance to allocate 4MiB for AUX buffer.

Sorry for typo, just correct:

if CoreSight buffer length is 1MiB, then we have no chance to *copy*
4MiB trace data for AUX buffer.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/3] coresight: Fix for snapshot mode
  2021-05-28 16:15 [PATCH v1 0/3] coresight: Fix for snapshot mode Leo Yan
                   ` (2 preceding siblings ...)
  2021-05-28 16:15 ` [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot() Leo Yan
@ 2021-06-10  6:43 ` Denis Nikitin
  2021-06-10 16:04   ` Suzuki K Poulose
  3 siblings, 1 reply; 21+ messages in thread
From: Denis Nikitin @ 2021-06-10  6:43 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Coresight ML, linux-arm-kernel,
	Linux Kernel Mailing List, linux-perf-users

Hi Leo,

On Fri, May 28, 2021 at 9:16 AM Leo Yan <leo.yan@linaro.org> wrote:
>
> This patch series is to correct the pointer usages for the snapshot
> mode.
>
> Patch 01 allows the AUX trace in the free run mode and only syncs the
> AUX ring buffer when taking snapshot.
>
> Patch 02 is to polish code, it removes the redundant header maintained
> in tmc-etr driver and directly uses pointer perf_output_handle::head.
>
> Patch 03 removes the callback cs_etm_find_snapshot() which wrongly
> calculates the buffer headers; we can simply use the perf's common
> function __auxtrace_mmap__read() for headers calculation.
>
> This patch can be cleanly applied on the mainline kernel with:
>
>   commit 97e5bf604b7a ("Merge branch 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu")
>
> And it has been tested on Arm64 Juno board.

I have verified the patches on Chrome OS Trogdor device.
They fixed the problem discussed in
https://lists.linaro.org/pipermail/coresight/2021-May/006411.html.

Tested-by: Denis Nikitin <denik@chromium.org>

Thanks,
Denis

>
>
> Leo Yan (3):
>   coresight: etm-perf: Correct buffer syncing for snapshot
>   coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
>   perf cs-etm: Remove callback cs_etm_find_snapshot()
>
>  .../hwtracing/coresight/coresight-etm-perf.c  |  30 +++-
>  .../hwtracing/coresight/coresight-etm-perf.h  |   2 +
>  .../hwtracing/coresight/coresight-tmc-etr.c   |  10 +-
>  tools/perf/arch/arm/util/cs-etm.c             | 133 ------------------
>  4 files changed, 32 insertions(+), 143 deletions(-)
>
> --
> 2.25.1
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot
  2021-06-01 10:35     ` Leo Yan
@ 2021-06-10 14:54       ` James Clark
  2021-06-11 13:55         ` James Clark
  0 siblings, 1 reply; 21+ messages in thread
From: James Clark @ 2021-06-10 14:54 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Denis Nikitin, coresight, linux-arm-kernel,
	linux-kernel, linux-perf-users



On 01/06/2021 13:35, Leo Yan wrote:
> Hi James,
> 
> On Tue, Jun 01, 2021 at 12:53:16PM +0300, James Clark wrote:
> 
> [...]
> 
>> Hi Leo,
>>
>> I was testing out snapshot mode (without your patch) and I noticed that it
>> only ever collects from the last CPU. For example on a 4 core system,
>> the CPU ID of the AUX records and the AUXTRACE buffers is always 3.
>>
>> This is with systemwide tracing, and running "stress -m 2 -c 2".
>> Is this something that your patch fixes, or am I doing something wrong, or
>> is it just a coincidence?
> 
> No, I think it's quite likely caused by blow code:
> 
> static unsigned long
> tmc_update_etr_buffer(struct coresight_device *csdev,
>                       struct perf_output_handle *handle,
>                       void *config)
> {
>     unsigned long flags, offset, size = 0;
> 
>     ...
> 
>     /* Don't do anything if another tracer is using this sink */
>     if (atomic_read(csdev->refcnt) != 1) {
>         spin_unlock_irqrestore(&drvdata->spinlock, flags);
>         goto out;
>     }
> 
>     ...
> 
>     return size;
> }
> 
> When using the system wide tracing, it updates the AUX ring buffer
> until the last tracer is stopped.  Thus whis is why it only records
> AUX ring buffer for the last CPU.
> 
> But this makes sense for me, this is because the last CPU is used to
> copy trace data to AUX ring buffer (so the perf event PERF_RECORD_AUX
> occurs on CPU3), but when you decode the trace data, you should can
> see the activities from other CPUs.
> 
> Thanks,
> Leo
> 

I have one more issue around snapshot mode which I'd like to check if it's
related to this patchset.

In "[PATCH v5 0/1] perf cs-etm: Split Coresight decode by aux records",
I added the warning for a missing AUXTRACE buffer as you suggested.
Now this warning gets triggered on the last AUX record when using
snapshot mode. I don't know if you are able to reproduce this.

Thanks
James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 2/3] coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer
  2021-05-28 16:15 ` [PATCH v1 2/3] coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer Leo Yan
@ 2021-06-10 15:38   ` Suzuki K Poulose
  0 siblings, 0 replies; 21+ messages in thread
From: Suzuki K Poulose @ 2021-06-10 15:38 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Mathieu Poirier, Mike Leach,
	Alexander Shishkin, John Garry, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Jiri Olsa, Namhyung Kim, Daniel Kiss, Denis Nikitin,
	coresight, linux-arm-kernel, linux-kernel, linux-perf-users

Hi Leo

On 28/05/2021 17:15, Leo Yan wrote:
> When enable the Arm CoreSight PMU event, the context for AUX ring buffer
> is prepared in the structure perf_output_handle, and its field "head"
> points the head of the AUX ring buffer and it is updated after filling
> AUX trace data into buffer.
> 
> Current code uses an extra field etr_perf_buffer::head to maintain the
> header for the AUX ring buffer, thus it's not necessary and it's better
> to directly perf_output_handle::head.
> 
> This patch removes the header etr_perf_buffer::head and directly used
> perf_output_handle::head as the header for AUX ring buffer.
> 
> Signed-off-by: Leo Yan <leo.yan@linaro.org>
> ---
>   drivers/hwtracing/coresight/coresight-tmc-etr.c | 10 +++-------
>   1 file changed, 3 insertions(+), 7 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index acdb59e0e661..b22823d67680 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -32,7 +32,6 @@ struct etr_flat_buf {
>    * @etr_buf		- Actual buffer used by the ETR
>    * @pid			- The PID this etr_perf_buffer belongs to.
>    * @snaphost		- Perf session mode
> - * @head		- handle->head at the beginning of the session.
>    * @nr_pages		- Number of pages in the ring buffer.
>    * @pages		- Array of Pages in the ring buffer.
>    */
> @@ -41,7 +40,6 @@ struct etr_perf_buffer {
>   	struct etr_buf		*etr_buf;
>   	pid_t			pid;
>   	bool			snapshot;
> -	unsigned long		head;
>   	int			nr_pages;
>   	void			**pages;
>   };
> @@ -1437,16 +1435,16 @@ static void tmc_free_etr_buffer(void *config)
>    * buffer to the perf ring buffer.
>    */
>   static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
> +				     unsigned long head,
>   				     unsigned long src_offset,
>   				     unsigned long to_copy)
>   {
>   	long bytes;
>   	long pg_idx, pg_offset;
> -	unsigned long head = etr_perf->head;
>   	char **dst_pages, *src_buf;
>   	struct etr_buf *etr_buf = etr_perf->etr_buf;
>   
> -	head = etr_perf->head;
> +	head = PERF_IDX2OFF(head, etr_perf);
>   	pg_idx = head >> PAGE_SHIFT;
>   	pg_offset = head & (PAGE_SIZE - 1);
>   	dst_pages = (char **)etr_perf->pages;
> @@ -1553,7 +1551,7 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
>   	/* Insert barrier packets at the beginning, if there was an overflow */
>   	if (lost)
>   		tmc_etr_buf_insert_barrier_packet(etr_buf, offset);
> -	tmc_etr_sync_perf_buffer(etr_perf, offset, size);
> +	tmc_etr_sync_perf_buffer(etr_perf, handle->head, offset, size);
>   
>   	/*
>   	 * In snapshot mode we simply increment the head by the number of byte
> @@ -1605,8 +1603,6 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
>   		goto unlock_out;
>   	}
>   
> -	etr_perf->head = PERF_IDX2OFF(handle->head, etr_perf);
> -
>   	/*
>   	 * No HW configuration is needed if the sink is already in
>   	 * use for this session.
> 

This looks good to me and could avoid any potential issues
with stale offset cached in the etr_perf_buffer.

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/3] coresight: Fix for snapshot mode
  2021-06-10  6:43 ` [PATCH v1 0/3] coresight: Fix for snapshot mode Denis Nikitin
@ 2021-06-10 16:04   ` Suzuki K Poulose
  2021-06-11  8:31     ` Denis Nikitin
  0 siblings, 1 reply; 21+ messages in thread
From: Suzuki K Poulose @ 2021-06-10 16:04 UTC (permalink / raw)
  To: Denis Nikitin, Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Mike Leach,
	Alexander Shishkin, John Garry, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Jiri Olsa, Namhyung Kim, Daniel Kiss, Coresight ML,
	linux-arm-kernel, Linux Kernel Mailing List, linux-perf-users

Hi Denis

On 10/06/2021 07:43, Denis Nikitin wrote:
> Hi Leo,
> 
> On Fri, May 28, 2021 at 9:16 AM Leo Yan <leo.yan@linaro.org> wrote:
>>
>> This patch series is to correct the pointer usages for the snapshot
>> mode.
>>
>> Patch 01 allows the AUX trace in the free run mode and only syncs the
>> AUX ring buffer when taking snapshot.
>>
>> Patch 02 is to polish code, it removes the redundant header maintained
>> in tmc-etr driver and directly uses pointer perf_output_handle::head.
>>
>> Patch 03 removes the callback cs_etm_find_snapshot() which wrongly
>> calculates the buffer headers; we can simply use the perf's common
>> function __auxtrace_mmap__read() for headers calculation.
>>
>> This patch can be cleanly applied on the mainline kernel with:
>>
>>    commit 97e5bf604b7a ("Merge branch 'for-5.13-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/dennis/percpu")
>>
>> And it has been tested on Arm64 Juno board.
> 
> I have verified the patches on Chrome OS Trogdor device.
> They fixed the problem discussed in
> https://lists.linaro.org/pipermail/coresight/2021-May/006411.html.

Are you able to confirm if the patch 3 alone fixes the above issue ?
I am not convinced that Patch 1 is necessary.

Suzuki

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/3] coresight: Fix for snapshot mode
  2021-06-10 16:04   ` Suzuki K Poulose
@ 2021-06-11  8:31     ` Denis Nikitin
  2021-06-12  3:27       ` Leo Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Denis Nikitin @ 2021-06-11  8:31 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Leo Yan, Arnaldo Carvalho de Melo, Mathieu Poirier, Mike Leach,
	Alexander Shishkin, John Garry, Will Deacon, Peter Zijlstra,
	Ingo Molnar, Jiri Olsa, Namhyung Kim, Daniel Kiss, Coresight ML,
	linux-arm-kernel, Linux Kernel Mailing List, linux-perf-users

Hi Suzuki,

On Thu, Jun 10, 2021 at 9:04 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
>
[...]
>
> Are you able to confirm if the patch 3 alone fixes the above issue ?
> I am not convinced that Patch 1 is necessary.
>

Yes. "perf cs-etm: Remove callback cs_etm_find_snapshot()" alone fixes
the issue.

- Denis

> Suzuki

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot
  2021-06-10 14:54       ` James Clark
@ 2021-06-11 13:55         ` James Clark
  0 siblings, 0 replies; 21+ messages in thread
From: James Clark @ 2021-06-11 13:55 UTC (permalink / raw)
  To: Leo Yan
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Denis Nikitin, coresight, linux-arm-kernel,
	linux-kernel, linux-perf-users



On 10/06/2021 17:54, James Clark wrote:
> 
> 
> On 01/06/2021 13:35, Leo Yan wrote:
>> Hi James,
>>
>> On Tue, Jun 01, 2021 at 12:53:16PM +0300, James Clark wrote:
>>
>> [...]
>>
>>> Hi Leo,
>>>
>>> I was testing out snapshot mode (without your patch) and I noticed that it
>>> only ever collects from the last CPU. For example on a 4 core system,
>>> the CPU ID of the AUX records and the AUXTRACE buffers is always 3.
>>>
>>> This is with systemwide tracing, and running "stress -m 2 -c 2".
>>> Is this something that your patch fixes, or am I doing something wrong, or
>>> is it just a coincidence?
>>
>> No, I think it's quite likely caused by blow code:
>>
>> static unsigned long
>> tmc_update_etr_buffer(struct coresight_device *csdev,
>>                       struct perf_output_handle *handle,
>>                       void *config)
>> {
>>     unsigned long flags, offset, size = 0;
>>
>>     ...
>>
>>     /* Don't do anything if another tracer is using this sink */
>>     if (atomic_read(csdev->refcnt) != 1) {
>>         spin_unlock_irqrestore(&drvdata->spinlock, flags);
>>         goto out;
>>     }
>>
>>     ...
>>
>>     return size;
>> }
>>
>> When using the system wide tracing, it updates the AUX ring buffer
>> until the last tracer is stopped.  Thus whis is why it only records
>> AUX ring buffer for the last CPU.
>>
>> But this makes sense for me, this is because the last CPU is used to
>> copy trace data to AUX ring buffer (so the perf event PERF_RECORD_AUX
>> occurs on CPU3), but when you decode the trace data, you should can
>> see the activities from other CPUs.
>>
>> Thanks,
>> Leo
>>
> 
> I have one more issue around snapshot mode which I'd like to check if it's
> related to this patchset.
> 
> In "[PATCH v5 0/1] perf cs-etm: Split Coresight decode by aux records",
> I added the warning for a missing AUXTRACE buffer as you suggested.
> Now this warning gets triggered on the last AUX record when using
> snapshot mode. I don't know if you are able to reproduce this.
> 

Hi Leo,

So I tested this set with my aux split patch, and I still get the warning about the
last AUX record not being found, so this is an independent issue. Whether
it could be (or needs to be fixed) at the same time, I'm not sure.

One thing seems to have improved with your set is the number of aux records
produced. For each SIGUSR2, I now get one aux record. Previously I got a random
number that didn't seem to match up, which didn't seem right to me.

But one thing that could be worse is the offsets and sizes. Now the size of the
aux records is always 0x100000, no matter what the -m arguments are, and the size
of the AUX record exceeds the size of the buffer. This makes the split patchset
fall back to processing the whole buffer because it never finds a buffer that the
AUX record fits in. For example:

	0 0 0xad0 [0x30]: PERF_RECORD_AUXTRACE size: 0x20000  offset: 0xe0000  ref: 0x406971eb0346c919  idx: 2  tid: 6794  cpu: 2
	2 5644375601060 0xa88 [0x40]: PERF_RECORD_AUX offset: 0x100000 size: 0x100000 flags: 0x2 [O]

The buffer is only 0x20000 long, but the aux record is 0x100000.

Another issue is that now the offsets don't match up:
	0 0 0xad0 [0x30]: PERF_RECORD_AUXTRACE size: 0x20000
  offset: 0xe0000  ref: 0x406971eb0346c919  idx: 2  tid: 6794  cpu: 2
	2 5644375601060 0xa88 [0x40]: PERF_RECORD_AUX offset: 0x100000 size: 0x100000 flags: 0x2 [O]
	0 0 0x20bb0 [0x30]: PERF_RECORD_AUXTRACE size: 0x20000  offset: 0x1e0000  ref: 0x1a3abb5c2407536a  idx: 2  tid: 6794  cpu: 2
	2 5644942278600 0x20b68 [0x40]: PERF_RECORD_AUX offset: 0x200000 size: 0x100000 flags: 0x2 [O]
	0 0 0x40c90 [0x30]: PERF_RECORD_AUXTRACE size: 0x20000  offset: 0x2e0000  ref: 0x7477d3d43d0a2ac7  idx: 2  tid: 6794  cpu: 2
	2 5645263266480 0x40c48 [0x40]: PERF_RECORD_AUX offset: 0x300000 size: 0x100000 flags: 0x2 [O]
	0 0 0x60d70 [0x30]: PERF_RECORD_AUXTRACE size: 0x20000  offset: 0x3e0000  ref: 0x202b939740c4f041  idx: 2  tid: 6794  cpu: 2
	2 5645569318020 0x60d28 [0x40]: PERF_RECORD_AUX offset: 0x400000 size: 0x100000 flags: 0x2 [O]

The buffers are offset by 0xe0000, but the aux records are on round 0x100000 offsets.

Maybe we need to re-think the aux split patchset, do we not need to split in snapshot mode? Or can we fix this
set to produce aux records that match up?

James

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/3] coresight: Fix for snapshot mode
  2021-06-11  8:31     ` Denis Nikitin
@ 2021-06-12  3:27       ` Leo Yan
  2021-06-21  8:21         ` Denis Nikitin
  0 siblings, 1 reply; 21+ messages in thread
From: Leo Yan @ 2021-06-12  3:27 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Suzuki K Poulose, Arnaldo Carvalho de Melo, Mathieu Poirier,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Coresight ML, linux-arm-kernel,
	Linux Kernel Mailing List, linux-perf-users

On Fri, Jun 11, 2021 at 01:31:41AM -0700, Denis Nikitin wrote:
> Hi Suzuki,
> 
> On Thu, Jun 10, 2021 at 9:04 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> >
> [...]
> >
> > Are you able to confirm if the patch 3 alone fixes the above issue ?
> > I am not convinced that Patch 1 is necessary.
> >
> 
> Yes. "perf cs-etm: Remove callback cs_etm_find_snapshot()" alone fixes
> the issue.

Based on current testing result, we should give high priority for
patches 2 and 3.

The patch 1 is controversial for how to handle the trace data kept
in multiple AUX buffers; essentially it's up to how we understand the
snapshot definition.  I confirmed Intel-PT and CoreSight have the same
behaviour for capturing trace data from multiple AUX ring buffers when
snapshot occurs.

I'd like to leave patch 1/3 out, and resend it if we get conclusion.
At the meantime, @Denis, if you have observed any profiling result
(or profiling quality) difference caused by patch 1, the feedback would
be very valuable.

Thanks a lot for Denis' testing and insight review from Suzuki!

Leo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/3] coresight: Fix for snapshot mode
  2021-06-12  3:27       ` Leo Yan
@ 2021-06-21  8:21         ` Denis Nikitin
  2021-06-22 12:58           ` Leo Yan
  0 siblings, 1 reply; 21+ messages in thread
From: Denis Nikitin @ 2021-06-21  8:21 UTC (permalink / raw)
  To: Leo Yan
  Cc: Suzuki K Poulose, Arnaldo Carvalho de Melo, Mathieu Poirier,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Coresight ML, linux-arm-kernel,
	Linux Kernel Mailing List, linux-perf-users

Hi Leo,

On Fri, Jun 11, 2021 at 5:27 PM Leo Yan <leo.yan@linaro.org> wrote:
>
> On Fri, Jun 11, 2021 at 01:31:41AM -0700, Denis Nikitin wrote:
> > Hi Suzuki,
> >
> > On Thu, Jun 10, 2021 at 9:04 AM Suzuki K Poulose <suzuki.poulose@arm.com> wrote:
> > >
> > [...]
> > >
> > > Are you able to confirm if the patch 3 alone fixes the above issue ?
> > > I am not convinced that Patch 1 is necessary.
> > >
> >
> > Yes. "perf cs-etm: Remove callback cs_etm_find_snapshot()" alone fixes
> > the issue.
>
> Based on current testing result, we should give high priority for
> patches 2 and 3.
>
> The patch 1 is controversial for how to handle the trace data kept
> in multiple AUX buffers; essentially it's up to how we understand the
> snapshot definition.  I confirmed Intel-PT and CoreSight have the same
> behaviour for capturing trace data from multiple AUX ring buffers when
> snapshot occurs.
>
> I'd like to leave patch 1/3 out, and resend it if we get conclusion.
> At the meantime, @Denis, if you have observed any profiling result
> (or profiling quality) difference caused by patch 1, the feedback would
> be very valuable.


I evaluated AutoFDO profiles with benchmarks but I was only focused
on the system-wide mode. And as I understood patch 1 fixes the issue
in non system-wide mode.
Currently I'm OoO so I won't be able to do further evaluation.

- Denis

>
>
> Thanks a lot for Denis' testing and insight review from Suzuki!
>
> Leo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 0/3] coresight: Fix for snapshot mode
  2021-06-21  8:21         ` Denis Nikitin
@ 2021-06-22 12:58           ` Leo Yan
  0 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2021-06-22 12:58 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Suzuki K Poulose, Arnaldo Carvalho de Melo, Mathieu Poirier,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Coresight ML, linux-arm-kernel,
	Linux Kernel Mailing List, linux-perf-users

Hi Denis,

On Sun, Jun 20, 2021 at 10:21:57PM -1000, Denis Nikitin wrote:

[...]

> > I'd like to leave patch 1/3 out, and resend it if we get conclusion.
> > At the meantime, @Denis, if you have observed any profiling result
> > (or profiling quality) difference caused by patch 1, the feedback would
> > be very valuable.
> 
> I evaluated AutoFDO profiles with benchmarks but I was only focused
> on the system-wide mode. And as I understood patch 1 fixes the issue
> in non system-wide mode.

Yes, patch 1 doesn't work for the system-wide mode.  In the system-wide
mode, CoreSight driver has its own reference counter to only allow the
last CPU to fill trace data to AUX buffer.

> Currently I'm OoO so I won't be able to do further evaluation.

No problem, thanks for the feedback!

Leo

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot()
  2021-05-28 16:15 ` [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot() Leo Yan
@ 2021-06-22 14:29   ` James Clark
  2021-06-28  1:31     ` Leo Yan
  2021-06-24 16:46   ` James Clark
  1 sibling, 1 reply; 21+ messages in thread
From: James Clark @ 2021-06-22 14:29 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Mathieu Poirier,
	Suzuki K Poulose, Mike Leach, Alexander Shishkin, John Garry,
	Will Deacon, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
	Namhyung Kim, Daniel Kiss, Denis Nikitin, coresight,
	linux-arm-kernel, linux-kernel, linux-perf-users



On 28/05/2021 17:15, Leo Yan wrote:
> The callback cs_etm_find_snapshot() is invoked for snapshot mode, its
> main purpose is to find the correct AUX trace data and returns "head"
> and "old" (we can call "old" as "old head") to the caller, the caller
> __auxtrace_mmap__read() uses these two pointers to decide the AUX trace
> data size.
> 
> cs_etm_find_snapshot() should be removed with below reasons:

Hi Leo,

I came across this other comment in coresight-tmc-etr.c that should probably
be fixed up if we remove cs_etm_find_snapshot(). The same is duplicated in a
few other files:

	/*
	 * In snapshot mode we simply increment the head by the number of byte
	 * that were written.  User space function  cs_etm_find_snapshot() will
	 * figure out how many bytes to get from the AUX buffer based on the
	 * position of the head.
	 */
	if (etr_perf->snapshot)
		handle->head += size;

I'm still reviewing patch 1 and 2 as well.

James


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot()
  2021-05-28 16:15 ` [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot() Leo Yan
  2021-06-22 14:29   ` James Clark
@ 2021-06-24 16:46   ` James Clark
  1 sibling, 0 replies; 21+ messages in thread
From: James Clark @ 2021-06-24 16:46 UTC (permalink / raw)
  To: Leo Yan, Arnaldo Carvalho de Melo, Mathieu Poirier,
	Suzuki K Poulose, Mike Leach, Alexander Shishkin, John Garry,
	Will Deacon, Peter Zijlstra, Ingo Molnar, Jiri Olsa,
	Namhyung Kim, Daniel Kiss, Denis Nikitin, coresight,
	linux-arm-kernel, linux-kernel, linux-perf-users



On 28/05/2021 17:15, Leo Yan wrote:
> The callback cs_etm_find_snapshot() is invoked for snapshot mode, its
> main purpose is to find the correct AUX trace data and returns "head"
> and "old" (we can call "old" as "old head") to the caller, the caller
> __auxtrace_mmap__read() uses these two pointers to decide the AUX trace
> data size.
> > cs_etm_find_snapshot() should be removed with below reasons:
> 
> - The first thing in cs_etm_find_snapshot() is to check if the head has
>   wrapped around, if it is not, directly bails out.  The checking is
>   pointless, this is because the "head" and "old" pointers both are
>   monotonical increasing so they never wrap around.
> 
For patch 3:

Reviewed-by: James Clark <james.clark@arm.com>
Tested-by: James Clark <james.clark@arm.com>

I think it's a good simplification and it also fixes the duplicate buffers
issue. And I agree with the reasoning about the pointer increasing monotonically.


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot()
  2021-06-22 14:29   ` James Clark
@ 2021-06-28  1:31     ` Leo Yan
  0 siblings, 0 replies; 21+ messages in thread
From: Leo Yan @ 2021-06-28  1:31 UTC (permalink / raw)
  To: James Clark
  Cc: Arnaldo Carvalho de Melo, Mathieu Poirier, Suzuki K Poulose,
	Mike Leach, Alexander Shishkin, John Garry, Will Deacon,
	Peter Zijlstra, Ingo Molnar, Jiri Olsa, Namhyung Kim,
	Daniel Kiss, Denis Nikitin, coresight, linux-arm-kernel,
	linux-kernel, linux-perf-users

Hi James,

On Tue, Jun 22, 2021 at 03:29:35PM +0100, James Clark wrote:
> 
> 
> On 28/05/2021 17:15, Leo Yan wrote:
> > The callback cs_etm_find_snapshot() is invoked for snapshot mode, its
> > main purpose is to find the correct AUX trace data and returns "head"
> > and "old" (we can call "old" as "old head") to the caller, the caller
> > __auxtrace_mmap__read() uses these two pointers to decide the AUX trace
> > data size.
> > 
> > cs_etm_find_snapshot() should be removed with below reasons:
> 
> Hi Leo,
> 
> I came across this other comment in coresight-tmc-etr.c that should probably
> be fixed up if we remove cs_etm_find_snapshot(). The same is duplicated in a
> few other files:
> 
> 	/*
> 	 * In snapshot mode we simply increment the head by the number of byte
> 	 * that were written.  User space function  cs_etm_find_snapshot() will
> 	 * figure out how many bytes to get from the AUX buffer based on the
> 	 * position of the head.
> 	 */
> 	if (etr_perf->snapshot)
> 		handle->head += size;

Good finding!  I will fix the comments in next version.

Thanks a lot for the reviewing and testing!

Leo

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2021-06-28  1:31 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-05-28 16:15 [PATCH v1 0/3] coresight: Fix for snapshot mode Leo Yan
2021-05-28 16:15 ` [PATCH v1 1/3] coresight: etm-perf: Correct buffer syncing for snapshot Leo Yan
2021-06-01  9:53   ` James Clark
2021-06-01 10:35     ` Leo Yan
2021-06-10 14:54       ` James Clark
2021-06-11 13:55         ` James Clark
2021-06-08 21:41   ` Mathieu Poirier
2021-06-09  1:35     ` Leo Yan
2021-06-09  1:39       ` Leo Yan
2021-05-28 16:15 ` [PATCH v1 2/3] coresight: tmc-etr: Use perf_output_handle::head for AUX ring buffer Leo Yan
2021-06-10 15:38   ` Suzuki K Poulose
2021-05-28 16:15 ` [PATCH v1 3/3] perf cs-etm: Remove callback cs_etm_find_snapshot() Leo Yan
2021-06-22 14:29   ` James Clark
2021-06-28  1:31     ` Leo Yan
2021-06-24 16:46   ` James Clark
2021-06-10  6:43 ` [PATCH v1 0/3] coresight: Fix for snapshot mode Denis Nikitin
2021-06-10 16:04   ` Suzuki K Poulose
2021-06-11  8:31     ` Denis Nikitin
2021-06-12  3:27       ` Leo Yan
2021-06-21  8:21         ` Denis Nikitin
2021-06-22 12:58           ` Leo Yan

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).