linux-arm-kernel.lists.infradead.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/4] coresight: Add ETR-PERF polling.
@ 2021-04-21 12:04 Daniel Kiss
  2021-04-21 12:04 ` [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer Daniel Kiss
                   ` (5 more replies)
  0 siblings, 6 replies; 35+ messages in thread
From: Daniel Kiss @ 2021-04-21 12:04 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose, mike.leach, leo.yan, coresight,
	linux-arm-kernel
  Cc: denik, Daniel Kiss

This series adds a feature to ETR-PERF that sync the ETR buffer to perf
periodically. This is really handy when the system wide trace is used
because in this case the perf won't sync during the trace. In a per-thread
setup the traced program might not go to the kernel frequvently enought
to collect trace. Polling helps in both usecases. Can be used with strobing.
Tuning polling period is challanging, I'm working on an additional patch
that adds some metrics to help tune the polling period.

Daniel Kiss (4):
  coresight: tmc-etr: Advance buffer pointer in sync buffer.
  coresight: tmc-etr: Track perf handler.
  coresight: etm-perf: Export etm_event_cpu_path.
  coresight: Add ETR-PERF polling.

 .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
 drivers/hwtracing/coresight/Makefile          |   2 +-
 .../hwtracing/coresight/coresight-etm-perf.c  |  10 +-
 .../hwtracing/coresight/coresight-etm-perf.h  |   1 +
 .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
 .../coresight/coresight-etr-perf-polling.h    |  42 +++
 .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
 .../hwtracing/coresight/coresight-tmc-etr.c   |  22 +-
 drivers/hwtracing/coresight/coresight-tmc.h   |   2 +
 9 files changed, 401 insertions(+), 4 deletions(-)
 create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
 create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h

-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer.
  2021-04-21 12:04 [PATCH 0/4] coresight: Add ETR-PERF polling Daniel Kiss
@ 2021-04-21 12:04 ` Daniel Kiss
  2021-04-23  8:23   ` Leo Yan
  2021-04-26 10:40   ` Suzuki K Poulose
  2021-04-21 12:04 ` [PATCH 2/4] coresight: tmc-etr: Track perf handler Daniel Kiss
                   ` (4 subsequent siblings)
  5 siblings, 2 replies; 35+ messages in thread
From: Daniel Kiss @ 2021-04-21 12:04 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose, mike.leach, leo.yan, coresight,
	linux-arm-kernel
  Cc: denik, Daniel Kiss, Branislav Rankov

With polling the sync could called multiple times in a row. Without this
change the data will be overwritten at the beginning of the buffer.

Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
Signed-off-by: Branislav Rankov <Branislav.Rankov@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index ea5027e397d02..dd19d1d1c3b38 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
 {
 	long bytes;
 	long pg_idx, pg_offset;
-	unsigned long head = etr_perf->head;
+	unsigned long head;
 	char **dst_pages, *src_buf;
 	struct etr_buf *etr_buf = etr_perf->etr_buf;
 
@@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
 		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
 					     &src_buf);
 		if (WARN_ON_ONCE(bytes <= 0))
-			break;
+			return;
 		bytes = min(bytes, (long)(PAGE_SIZE - pg_offset));
 
 		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
@@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
 		/* Move source pointers */
 		src_offset += bytes;
 	}
+	etr_perf->head = (pg_idx << PAGE_SHIFT) + pg_offset;
 }
 
 /*
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 2/4] coresight: tmc-etr: Track perf handler.
  2021-04-21 12:04 [PATCH 0/4] coresight: Add ETR-PERF polling Daniel Kiss
  2021-04-21 12:04 ` [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer Daniel Kiss
@ 2021-04-21 12:04 ` Daniel Kiss
  2021-04-23  9:20   ` Leo Yan
  2021-04-21 12:04 ` [PATCH 3/4] coresight: etm-perf: Export etm_event_cpu_path Daniel Kiss
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 35+ messages in thread
From: Daniel Kiss @ 2021-04-21 12:04 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose, mike.leach, leo.yan, coresight,
	linux-arm-kernel
  Cc: denik, Daniel Kiss, Branislav Rankov

Keep track of the perf handler that is registred by the first tracer.
This will be used by the update call from polling.

Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
Signed-off-by: Branislav Rankov <Branislav.Rankov@arm.com>
---
 drivers/hwtracing/coresight/coresight-tmc-etr.c | 8 ++++++++
 drivers/hwtracing/coresight/coresight-tmc.h     | 2 ++
 2 files changed, 10 insertions(+)

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index dd19d1d1c3b38..bf9f6311d8663 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1511,6 +1511,12 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
 		goto out;
 	}
 
+	/* Serve only the tracer with the right handler */
+	if (drvdata->perf_handle != handle) {
+		spin_unlock_irqrestore(&drvdata->spinlock, flags);
+		goto out;
+	}
+
 	if (WARN_ON(drvdata->perf_buf != etr_buf)) {
 		lost = true;
 		spin_unlock_irqrestore(&drvdata->spinlock, flags);
@@ -1623,6 +1629,7 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
 		drvdata->pid = pid;
 		drvdata->mode = CS_MODE_PERF;
 		drvdata->perf_buf = etr_perf->etr_buf;
+		drvdata->perf_handle = handle;
 		atomic_inc(csdev->refcnt);
 	}
 
@@ -1670,6 +1677,7 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev)
 	drvdata->mode = CS_MODE_DISABLED;
 	/* Reset perf specific data */
 	drvdata->perf_buf = NULL;
+	drvdata->perf_handle = NULL;
 
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc.h b/drivers/hwtracing/coresight/coresight-tmc.h
index b91ec7dde7bc9..81583ffb973dc 100644
--- a/drivers/hwtracing/coresight/coresight-tmc.h
+++ b/drivers/hwtracing/coresight/coresight-tmc.h
@@ -184,6 +184,7 @@ struct etr_buf {
  * @idr_mutex:	Access serialisation for idr.
  * @sysfs_buf:	SYSFS buffer for ETR.
  * @perf_buf:	PERF buffer for ETR.
+ * @perf_handle: PERF handle for ETR.
  */
 struct tmc_drvdata {
 	void __iomem		*base;
@@ -207,6 +208,7 @@ struct tmc_drvdata {
 	struct mutex		idr_mutex;
 	struct etr_buf		*sysfs_buf;
 	struct etr_buf		*perf_buf;
+	struct perf_output_handle *perf_handle;
 };
 
 struct etr_buf_operations {
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 3/4] coresight: etm-perf: Export etm_event_cpu_path.
  2021-04-21 12:04 [PATCH 0/4] coresight: Add ETR-PERF polling Daniel Kiss
  2021-04-21 12:04 ` [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer Daniel Kiss
  2021-04-21 12:04 ` [PATCH 2/4] coresight: tmc-etr: Track perf handler Daniel Kiss
@ 2021-04-21 12:04 ` Daniel Kiss
  2021-04-21 12:04 ` [PATCH 4/4] coresight: Add ETR-PERF polling Daniel Kiss
                   ` (2 subsequent siblings)
  5 siblings, 0 replies; 35+ messages in thread
From: Daniel Kiss @ 2021-04-21 12:04 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose, mike.leach, leo.yan, coresight,
	linux-arm-kernel
  Cc: denik, Daniel Kiss, Branislav Rankov

Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
Signed-off-by: Branislav Rankov <Branislav.Rankov@arm.com>
---
 drivers/hwtracing/coresight/coresight-etm-perf.c | 2 +-
 drivers/hwtracing/coresight/coresight-etm-perf.h | 1 +
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index f123c26b9f549..78a55fc2bcab5 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -117,7 +117,7 @@ etm_event_cpu_path_ptr(struct etm_event_data *data, int cpu)
 	return per_cpu_ptr(data->path, cpu);
 }
 
-static inline struct list_head *
+struct list_head *
 etm_event_cpu_path(struct etm_event_data *data, int cpu)
 {
 	return *etm_event_cpu_path_ptr(data, cpu);
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.h b/drivers/hwtracing/coresight/coresight-etm-perf.h
index 3e4f2ad5e193d..6d02abbad1b0f 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.h
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.h
@@ -69,6 +69,7 @@ static inline void *etm_perf_sink_config(struct perf_output_handle *handle)
 		return data->snk_config;
 	return NULL;
 }
+struct list_head *etm_event_cpu_path(struct etm_event_data *data, int cpu);
 #else
 static inline int etm_perf_symlink(struct coresight_device *csdev, bool link)
 { return -EINVAL; }
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* [PATCH 4/4] coresight: Add ETR-PERF polling.
  2021-04-21 12:04 [PATCH 0/4] coresight: Add ETR-PERF polling Daniel Kiss
                   ` (2 preceding siblings ...)
  2021-04-21 12:04 ` [PATCH 3/4] coresight: etm-perf: Export etm_event_cpu_path Daniel Kiss
@ 2021-04-21 12:04 ` Daniel Kiss
  2021-04-26  1:18   ` Leo Yan
  2021-05-05  7:21   ` Denis Nikitin
  2021-04-26 17:54 ` [PATCH 0/4] " Mathieu Poirier
  2021-04-27 16:24 ` James Clark
  5 siblings, 2 replies; 35+ messages in thread
From: Daniel Kiss @ 2021-04-21 12:04 UTC (permalink / raw)
  To: mathieu.poirier, suzuki.poulose, mike.leach, leo.yan, coresight,
	linux-arm-kernel
  Cc: denik, Daniel Kiss, Branislav Rankov

ETR might fill up the buffer sooner than an event makes perf to trigger
the synchronisation especially in system wide trace. Polling runs
periodically to sync the ETR buffer. Period is configurable via sysfs,
disabled by default.

Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
Signed-off-by: Branislav Rankov <Branislav.Rankov@arm.com>
---
 .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
 drivers/hwtracing/coresight/Makefile          |   2 +-
 .../hwtracing/coresight/coresight-etm-perf.c  |   8 +
 .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
 .../coresight/coresight-etr-perf-polling.h    |  42 +++
 .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
 .../hwtracing/coresight/coresight-tmc-etr.c   |   9 +
 7 files changed, 386 insertions(+), 1 deletion(-)
 create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
 create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h

diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
index 6aa527296c710..4ca7af22a3686 100644
--- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
+++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
@@ -91,3 +91,11 @@ Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
 Description:	(RW) Size of the trace buffer for TMC-ETR when used in SYSFS
 		mode. Writable only for TMC-ETR configurations. The value
 		should be aligned to the kernel pagesize.
+
+What:		/sys/bus/coresight/devices/<memory_map>.tmc/polling/period
+Date:		April 2021
+KernelVersion:	5.13
+Contact:	Daniel Kiss <daniel.kiss@arm.com>
+Description:	(RW) Time in milliseconds when the TMC-ETR is synced.
+		Default value is 0, means the feature is disabled.
+		Writable only for TMC-ETR configurations.
diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
index d60816509755c..4df90b71d98cd 100644
--- a/drivers/hwtracing/coresight/Makefile
+++ b/drivers/hwtracing/coresight/Makefile
@@ -4,7 +4,7 @@
 #
 obj-$(CONFIG_CORESIGHT) += coresight.o
 coresight-y := coresight-core.o  coresight-etm-perf.o coresight-platform.o \
-		coresight-sysfs.o
+		coresight-sysfs.o  coresight-etr-perf-polling.o
 obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o
 coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \
 		      coresight-tmc-etr.o
diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
index 78a55fc2bcab5..910a99944eea8 100644
--- a/drivers/hwtracing/coresight/coresight-etm-perf.c
+++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
@@ -19,6 +19,7 @@
 #include <linux/workqueue.h>
 
 #include "coresight-etm-perf.h"
+#include "coresight-etr-perf-polling.h"
 #include "coresight-priv.h"
 
 static struct pmu etm_pmu;
@@ -438,6 +439,8 @@ static void etm_event_start(struct perf_event *event, int flags)
 	/* Tell the perf core the event is alive */
 	event->hw.state = 0;
 
+	etr_perf_polling_event_start(event, event_data, handle);
+
 	/* Finally enable the tracer */
 	if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF))
 		goto fail_disable_path;
@@ -497,6 +500,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
 	if (!sink)
 		return;
 
+	etr_perf_polling_event_stop(event, event_data);
+
 	/* stop tracer */
 	source_ops(csdev)->disable(csdev, event);
 
@@ -741,6 +746,8 @@ int __init etm_perf_init(void)
 	etm_pmu.addr_filters_validate	= etm_addr_filters_validate;
 	etm_pmu.nr_addr_filters		= ETM_ADDR_CMP_MAX;
 
+	etr_perf_polling_init();
+
 	ret = perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1);
 	if (ret == 0)
 		etm_perf_up = true;
@@ -750,5 +757,6 @@ int __init etm_perf_init(void)
 
 void __exit etm_perf_exit(void)
 {
+	etr_perf_polling_exit();
 	perf_pmu_unregister(&etm_pmu);
 }
diff --git a/drivers/hwtracing/coresight/coresight-etr-perf-polling.c b/drivers/hwtracing/coresight/coresight-etr-perf-polling.c
new file mode 100644
index 0000000000000..aa0352908873a
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-etr-perf-polling.c
@@ -0,0 +1,316 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright(C) 2021 Arm Limited. All rights reserved.
+ * Author: Daniel Kiss <daniel.kiss@arm.com>
+ */
+
+#include <linux/coresight.h>
+#include <linux/coresight-pmu.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/init.h>
+#include <linux/list.h>
+#include <linux/percpu-defs.h>
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/stringhash.h>
+#include <linux/types.h>
+#include <linux/workqueue.h>
+
+#include "coresight-etr-perf-polling.h"
+#include "coresight-priv.h"
+#include "coresight-tmc.h"
+
+#if defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) || \
+    defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC_MODULE)
+
+struct polling_event_list {
+	struct perf_event *perf_event;
+	struct etm_event_data *etm_event_data;
+	struct perf_output_handle *ctx_handle;
+	void (*tmc_etr_reset_hw)(struct tmc_drvdata *);
+	struct list_head list;
+};
+
+struct polling {
+	int cpu;
+	struct list_head polled_events;
+	struct delayed_work delayed_work;
+};
+
+static atomic_t period;
+static spinlock_t spinlock_re;
+static struct list_head registered_events;
+
+static DEFINE_PER_CPU(struct polling, polling);
+
+static ssize_t period_show(struct device *dev, struct device_attribute *attr,
+			   char *buf)
+{
+	int temp;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
+		return -EPERM;
+
+	temp = atomic_read(&period);
+	return sprintf(buf, "%i\n", temp);
+}
+
+static ssize_t period_store(struct device *dev, struct device_attribute *attr,
+			    const char *buf, size_t count)
+{
+	int temp = 0;
+	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
+
+	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
+		return -EPERM;
+
+	if ((1 == sscanf(buf, "%i", &temp)) && (temp >= 0))
+		atomic_set(&period, temp);
+	return count;
+}
+
+static DEVICE_ATTR_RW(period);
+
+static struct attribute *coresight_tmc_polling_attrs[] = {
+	&dev_attr_period.attr,
+	NULL,
+};
+const struct attribute_group coresight_tmc_polling_group = {
+	.attrs = coresight_tmc_polling_attrs,
+	.name = "polling",
+};
+EXPORT_SYMBOL_GPL(coresight_tmc_polling_group);
+
+static inline void polling_sched_worker(struct polling *p)
+{
+	int tickrate = atomic_read(&period);
+	if (!list_empty(&p->polled_events) && (tickrate > 0))
+		schedule_delayed_work_on(p->cpu, &p->delayed_work,
+					 msecs_to_jiffies(tickrate));
+}
+
+static inline bool is_etr_related(struct etm_event_data *etm_event_data, int cpu)
+{
+	struct list_head *path;
+	struct coresight_device *sink;
+	struct tmc_drvdata *drvdata;
+	path = etm_event_cpu_path(etm_event_data, cpu);
+	if (WARN_ON(!path))
+		return false;
+	sink = coresight_get_sink(path);
+	if (WARN_ON(!sink))
+		return false;
+	drvdata = dev_get_drvdata(sink->dev.parent);
+	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
+		return false;
+	return true;
+}
+
+/*
+ * Adds the event to the polled events list.
+ */
+void etr_perf_polling_event_start(struct perf_event *event,
+				  struct etm_event_data *etm_event_data,
+				  struct perf_output_handle *ctx_handle)
+{
+	int cpu = smp_processor_id();
+	struct polling *p = per_cpu_ptr(&polling, cpu);
+	struct polling_event_list *element;
+	struct list_head *i, *tmp;
+
+	if (!is_etr_related(etm_event_data, cpu))
+		return;
+
+	spin_lock(&spinlock_re);
+	list_for_each_safe (i, tmp, &registered_events) {
+		element = list_entry(i, struct polling_event_list, list);
+		if (element->ctx_handle == ctx_handle) {
+			element->perf_event = event;
+			element->etm_event_data = etm_event_data;
+			list_del(&element->list);
+			spin_unlock(&spinlock_re);
+			list_add(&element->list, &p->polled_events);
+			polling_sched_worker(p);
+			return;
+		}
+	}
+	spin_unlock(&spinlock_re);
+}
+EXPORT_SYMBOL_GPL(etr_perf_polling_event_start);
+
+/*
+ * Removes the event from the to be polled events list.
+ */
+void etr_perf_polling_event_stop(struct perf_event *event,
+				 struct etm_event_data *etm_event_data)
+{
+	int cpu = smp_processor_id();
+	struct list_head *i, *tmp;
+	struct polling *p = per_cpu_ptr(&polling, cpu);
+
+	if (!is_etr_related(etm_event_data, cpu))
+		return;
+
+	list_for_each_safe (i, tmp, &p->polled_events) {
+		struct polling_event_list *element =
+			list_entry(i, struct polling_event_list, list);
+		if (element->perf_event == event) {
+			list_del(&element->list);
+			element->perf_event = NULL;
+			element->etm_event_data = NULL;
+			spin_lock(&spinlock_re);
+			list_add(&element->list, &registered_events);
+			spin_unlock(&spinlock_re);
+			if (list_empty(&p->polled_events)) {
+				cancel_delayed_work(&p->delayed_work);
+			}
+			return;
+		}
+	}
+}
+EXPORT_SYMBOL_GPL(etr_perf_polling_event_stop);
+
+/*
+ * The polling worker is a workqueue job which is periodically
+ * woken up to update the perf aux buffer from the etr shrink.
+ */
+static void etr_perf_polling_worker(struct work_struct *work)
+{
+	unsigned long flags;
+	int cpu = smp_processor_id();
+	struct polling *p = per_cpu_ptr(&polling, cpu);
+	struct list_head *i, *tmp;
+
+	if (!atomic_read(&period))
+		return;
+
+	/*
+	 * Scheduling would do the same from the perf hooks,
+	 * this should be done in one go.
+	 */
+	local_irq_save(flags);
+	preempt_disable();
+	/* Perf requires rcu lock. */
+	rcu_read_lock();
+
+	polling_sched_worker(p);
+
+	list_for_each_safe (i, tmp, &p->polled_events) {
+		struct list_head *path;
+		struct coresight_device *sink;
+		struct polling_event_list *element =
+			list_entry(i, struct polling_event_list, list);
+
+		path = etm_event_cpu_path(element->etm_event_data, cpu);
+		if (WARN_ON(!path))
+			continue;
+		sink = coresight_get_sink(path);
+		if (WARN_ON(!sink))
+			continue;
+		if (sink_ops(sink)->update_buffer) {
+			int size, refcnt;
+			struct tmc_drvdata *drvdata = dev_get_drvdata(sink->dev.parent);
+
+			/*
+			 * Act as now we are the only users of the sink. Due to the locks
+			 * we are safe.
+			 */
+			refcnt = atomic_xchg(sink->refcnt, 1);
+			size = sink_ops(sink)->update_buffer(
+				sink, element->ctx_handle,
+				element->etm_event_data->snk_config);
+			refcnt = atomic_xchg(sink->refcnt, refcnt);
+			/*
+			 * Restart the trace.
+			 */
+			if (element->tmc_etr_reset_hw)
+				element->tmc_etr_reset_hw(drvdata);
+
+			WARN_ON(size < 0);
+			if (size > 0) {
+				struct etm_event_data *new_event_data;
+
+				perf_aux_output_end(element->ctx_handle, size);
+				new_event_data = perf_aux_output_begin(
+					element->ctx_handle,
+					element->perf_event);
+				if (WARN_ON(new_event_data == NULL))
+					continue;
+				element->etm_event_data = new_event_data;
+				WARN_ON(new_event_data->snk_config !=
+					element->etm_event_data->snk_config);
+			}
+		}
+	}
+
+	rcu_read_unlock();
+	preempt_enable();
+	local_irq_restore(flags);
+}
+
+void etr_perf_polling_handle_register(struct perf_output_handle *handle,
+				      void (*tmc_etr_reset_hw)(struct tmc_drvdata *drvdata))
+{
+	struct polling_event_list *element;
+
+	element = kmalloc(sizeof(*element), GFP_KERNEL);
+	if (WARN_ON(!element))
+		return;
+	memset(element, 0, sizeof(*element));
+	element->ctx_handle = handle;
+	element->tmc_etr_reset_hw = tmc_etr_reset_hw;
+	spin_lock(&spinlock_re);
+	list_add(&element->list, &registered_events);
+	spin_unlock(&spinlock_re);
+}
+EXPORT_SYMBOL_GPL(etr_perf_polling_handle_register);
+
+void etr_perf_polling_handle_deregister(struct perf_output_handle *handle)
+{
+	struct list_head *i, *tmp;
+
+	spin_lock(&spinlock_re);
+	list_for_each_safe (i, tmp, &registered_events) {
+		struct polling_event_list *element =
+			list_entry(i, struct polling_event_list, list);
+		if (element->ctx_handle == handle) {
+			list_del(&element->list);
+			spin_unlock(&spinlock_re);
+			kfree(element);
+			return;
+		}
+	}
+	spin_unlock(&spinlock_re);
+}
+EXPORT_SYMBOL_GPL(etr_perf_polling_handle_deregister);
+
+void etr_perf_polling_init(void)
+{
+	int cpu;
+	spin_lock_init(&spinlock_re);
+	INIT_LIST_HEAD(&registered_events);
+	atomic_set(&period, 0);
+	for_each_possible_cpu (cpu) {
+		struct polling *p = per_cpu_ptr(&polling, cpu);
+		p->cpu = cpu;
+		INIT_LIST_HEAD(&p->polled_events);
+		INIT_DELAYED_WORK(&p->delayed_work, etr_perf_polling_worker);
+	}
+}
+EXPORT_SYMBOL_GPL(etr_perf_polling_init);
+
+void etr_perf_polling_exit(void)
+{
+	int cpu;
+	for_each_possible_cpu (cpu) {
+		struct polling *p = per_cpu_ptr(&polling, cpu);
+		cancel_delayed_work_sync(&p->delayed_work);
+		WARN_ON(!list_empty(&p->polled_events));
+	}
+	WARN_ON(!list_empty(&registered_events));
+}
+EXPORT_SYMBOL_GPL(etr_perf_polling_exit);
+
+#endif
diff --git a/drivers/hwtracing/coresight/coresight-etr-perf-polling.h b/drivers/hwtracing/coresight/coresight-etr-perf-polling.h
new file mode 100644
index 0000000000000..5917e1fa408bb
--- /dev/null
+++ b/drivers/hwtracing/coresight/coresight-etr-perf-polling.h
@@ -0,0 +1,42 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Copyright(C) 2021 Arm Limited. All rights reserved.
+ * Author: Daniel Kiss <daniel.kiss@arm.com>
+ */
+
+#ifndef _CORESIGHT_ETM_PERF_POLLING_H
+#define _CORESIGHT_ETM_PERF_POLLING_H
+
+#include <linux/coresight.h>
+#include <linux/perf_event.h>
+#include "coresight-etm-perf.h"
+#include "coresight-tmc.h"
+
+#if defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) || \
+    defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC_MODULE)
+
+void etr_perf_polling_init(void);
+void etr_perf_polling_exit(void);
+void etr_perf_polling_handle_register(struct perf_output_handle *handle,
+				      void (*tmc_etr_reset_hw)(struct tmc_drvdata *drvdata));
+void etr_perf_polling_handle_deregister(struct perf_output_handle *handle);
+void etr_perf_polling_event_start(struct perf_event *event,
+				  struct etm_event_data *etm_event_data,
+				  struct perf_output_handle *ctx_handle);
+void etr_perf_polling_event_stop(struct perf_event *event,
+				 struct etm_event_data *etm_event_data);
+
+extern const struct attribute_group coresight_tmc_polling_group;
+#define CORESIGHT_TMP_POLLING_GROUP &coresight_tmc_polling_group,
+
+#else /* !CONFIG_CORESIGHT_LINK_AND_SINK_TMC */
+#define etr_perf_polling_init()
+#define etr_perf_polling_exit()
+#define etr_perf_polling_handle_register(...)
+#define etr_perf_polling_handle_deregister(...)
+#define etr_perf_polling_event_start(...)
+#define etr_perf_polling_event_stop(...)
+#define CORESIGHT_TMP_POLLING_GROUP
+#endif
+
+#endif
diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c
index 74c6323d4d6ab..51e705ef3ffa3 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-core.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-core.c
@@ -26,6 +26,7 @@
 
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
+#include "coresight-etr-perf-polling.h"
 
 DEFINE_CORESIGHT_DEVLIST(etb_devs, "tmc_etb");
 DEFINE_CORESIGHT_DEVLIST(etf_devs, "tmc_etf");
@@ -365,6 +366,7 @@ static const struct attribute_group coresight_tmc_mgmt_group = {
 static const struct attribute_group *coresight_tmc_groups[] = {
 	&coresight_tmc_group,
 	&coresight_tmc_mgmt_group,
+	CORESIGHT_TMP_POLLING_GROUP
 	NULL,
 };
 
diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index bf9f6311d8663..021b594e38e71 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -16,6 +16,7 @@
 #include <linux/vmalloc.h>
 #include "coresight-catu.h"
 #include "coresight-etm-perf.h"
+#include "coresight-etr-perf-polling.h"
 #include "coresight-priv.h"
 #include "coresight-tmc.h"
 
@@ -1139,6 +1140,12 @@ void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
 	drvdata->etr_buf = NULL;
 }
 
+static void tmc_etr_reset_hw(struct tmc_drvdata *drvdata)
+{
+	__tmc_etr_disable_hw(drvdata);
+	__tmc_etr_enable_hw(drvdata);
+}
+
 static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
 {
 	int ret = 0;
@@ -1630,6 +1637,7 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
 		drvdata->mode = CS_MODE_PERF;
 		drvdata->perf_buf = etr_perf->etr_buf;
 		drvdata->perf_handle = handle;
+		etr_perf_polling_handle_register(handle, tmc_etr_reset_hw);
 		atomic_inc(csdev->refcnt);
 	}
 
@@ -1677,6 +1685,7 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev)
 	drvdata->mode = CS_MODE_DISABLED;
 	/* Reset perf specific data */
 	drvdata->perf_buf = NULL;
+	etr_perf_polling_handle_deregister(drvdata->perf_handle);
 	drvdata->perf_handle = NULL;
 
 	spin_unlock_irqrestore(&drvdata->spinlock, flags);
-- 
2.25.1


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer.
  2021-04-21 12:04 ` [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer Daniel Kiss
@ 2021-04-23  8:23   ` Leo Yan
  2021-04-26 10:40   ` Suzuki K Poulose
  1 sibling, 0 replies; 35+ messages in thread
From: Leo Yan @ 2021-04-23  8:23 UTC (permalink / raw)
  To: Daniel Kiss
  Cc: mathieu.poirier, suzuki.poulose, mike.leach, coresight,
	linux-arm-kernel, denik, Branislav Rankov

On Wed, Apr 21, 2021 at 02:04:10PM +0200, Daniel Kiss wrote:
> With polling the sync could called multiple times in a row. Without this
> change the data will be overwritten at the beginning of the buffer.
> 
> Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
> Signed-off-by: Branislav Rankov <Branislav.Rankov@arm.com>

From the last week, I spent time to dig into AUX trace buffer, thus
this patch set gives me a good chance to apply my learning for perf
ring buffer.

For this patch:

Reviewed-by: Leo Yan <leo.yan@linaro.org>


> ---
>  drivers/hwtracing/coresight/coresight-tmc-etr.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index ea5027e397d02..dd19d1d1c3b38 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>  {
>  	long bytes;
>  	long pg_idx, pg_offset;
> -	unsigned long head = etr_perf->head;
> +	unsigned long head;
>  	char **dst_pages, *src_buf;
>  	struct etr_buf *etr_buf = etr_perf->etr_buf;
>  
> @@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>  		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
>  					     &src_buf);
>  		if (WARN_ON_ONCE(bytes <= 0))
> -			break;
> +			return;
>  		bytes = min(bytes, (long)(PAGE_SIZE - pg_offset));
>  
>  		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
> @@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>  		/* Move source pointers */
>  		src_offset += bytes;
>  	}
> +	etr_perf->head = (pg_idx << PAGE_SHIFT) + pg_offset;
>  }
>  
>  /*
> -- 
> 2.25.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/4] coresight: tmc-etr: Track perf handler.
  2021-04-21 12:04 ` [PATCH 2/4] coresight: tmc-etr: Track perf handler Daniel Kiss
@ 2021-04-23  9:20   ` Leo Yan
  2021-04-26  0:25     ` Leo Yan
  0 siblings, 1 reply; 35+ messages in thread
From: Leo Yan @ 2021-04-23  9:20 UTC (permalink / raw)
  To: Daniel Kiss
  Cc: mathieu.poirier, suzuki.poulose, mike.leach, coresight,
	linux-arm-kernel, denik, Branislav Rankov

Hi Daniel,

On Wed, Apr 21, 2021 at 02:04:11PM +0200, Daniel Kiss wrote:

[...]

> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index dd19d1d1c3b38..bf9f6311d8663 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1511,6 +1511,12 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
>  		goto out;
>  	}
>  
> +	/* Serve only the tracer with the right handler */
> +	if (drvdata->perf_handle != handle) {
> +		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> +		goto out;
> +	}
> +

I have concern for this change, Let's use the system-wide tracing as
an example.

If a system have 4 CPUs, for the perf with system wide tracing, the
tool maps the AUX ring buffers for four times, but the CoreSight
driver only allocates pages once and maps these physical pages for
four times to user space.  Therefore, the perf tool in the userspace
manages 4 AUX ring buffers, every AUX ring buffer is served for one
CPU.

The confusion between the CoreSight driver (in the kernel) and the
perf tool (in the userspace) is: there actually has only one ring
buffer for the enabled sink (let's say ETR), but there have four ring
buffer control structures, the control structure is
'perf_event_mmap_page' which is resident in the first page for perf's
general ring buffer (please note, this ring buffer is different from
AUX ring buffer).

IIUC, this patch only allows the first CPU which enables coresight path
to update the AUX ring buffer.  This can break the case:

  - Step 1: perf tool opens ETM event; we can use the command:

    # perf record -o ${perfdata} -e cs_etm/@tmc_etr0/ -a
           -- dd if=/dev/zero of=/dev/null

  - Step 2: the profiled program "dd" is firstly schedued in CPU0, so
    its "perf_handle" will be assigned to "drvdata->perf_handle";

  - Step 3: if the program "dd" is migrated to CPU1 and it never runs
    on CPU0 afterwards, then this patch will prevent to update the AUX
    ring buffer, due to the "drvdata->perf_handle" cannot match with
    CPU1's handler.

On the other hand, I think we should change to always stick to the
same "perf_output_handle" for all CPUs, thus it can allow all CPUs
to use the same structure 'perf_event_mmap_page' for AUX ring buffer
management.

[...]

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 2/4] coresight: tmc-etr: Track perf handler.
  2021-04-23  9:20   ` Leo Yan
@ 2021-04-26  0:25     ` Leo Yan
  0 siblings, 0 replies; 35+ messages in thread
From: Leo Yan @ 2021-04-26  0:25 UTC (permalink / raw)
  To: Daniel Kiss
  Cc: mathieu.poirier, suzuki.poulose, mike.leach, coresight,
	linux-arm-kernel, denik, Branislav Rankov

On Fri, Apr 23, 2021 at 05:20:38PM +0800, Leo Yan wrote:
> Hi Daniel,
> 
> On Wed, Apr 21, 2021 at 02:04:11PM +0200, Daniel Kiss wrote:
> 
> [...]
> 
> > diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> > index dd19d1d1c3b38..bf9f6311d8663 100644
> > --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> > +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> > @@ -1511,6 +1511,12 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
> >  		goto out;
> >  	}
> >  
> > +	/* Serve only the tracer with the right handler */
> > +	if (drvdata->perf_handle != handle) {
> > +		spin_unlock_irqrestore(&drvdata->spinlock, flags);
> > +		goto out;
> > +	}
> > +
> 
> I have concern for this change, Let's use the system-wide tracing as
> an example.
> 
> If a system have 4 CPUs, for the perf with system wide tracing, the
> tool maps the AUX ring buffers for four times, but the CoreSight
> driver only allocates pages once and maps these physical pages for
> four times to user space.  Therefore, the perf tool in the userspace
> manages 4 AUX ring buffers, every AUX ring buffer is served for one
> CPU.
> 
> The confusion between the CoreSight driver (in the kernel) and the
> perf tool (in the userspace) is: there actually has only one ring
> buffer for the enabled sink (let's say ETR), but there have four ring
> buffer control structures, the control structure is
> 'perf_event_mmap_page' which is resident in the first page for perf's
> general ring buffer (please note, this ring buffer is different from
> AUX ring buffer).
> 
> IIUC, this patch only allows the first CPU which enables coresight path
> to update the AUX ring buffer.  This can break the case:
> 
>   - Step 1: perf tool opens ETM event; we can use the command:
> 
>     # perf record -o ${perfdata} -e cs_etm/@tmc_etr0/ -a
>            -- dd if=/dev/zero of=/dev/null
> 
>   - Step 2: the profiled program "dd" is firstly schedued in CPU0, so
>     its "perf_handle" will be assigned to "drvdata->perf_handle";
> 
>   - Step 3: if the program "dd" is migrated to CPU1 and it never runs
>     on CPU0 afterwards, then this patch will prevent to update the AUX
>     ring buffer, due to the "drvdata->perf_handle" cannot match with
>     CPU1's handler.

Want to clarify, this case only happens with "snapshot" mode; With
Mathieu's reminding, "snapshot" mode is quite special: it creates AUX
ring buffer per CPU, but when enable the tracing, if without
specifying the option "-a" for system wide tracing, it only enables
ETM tracer for a CPU when the profiled program is scheduled on that CPU.

To avoid over complexsity, let's give this low priority and firstly
focus on the system-wide tracing case.

Thanks,
Leo

> On the other hand, I think we should change to always stick to the
> same "perf_output_handle" for all CPUs, thus it can allow all CPUs
> to use the same structure 'perf_event_mmap_page' for AUX ring buffer
> management.

> 
> [...]
> 
> Thanks,
> Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/4] coresight: Add ETR-PERF polling.
  2021-04-21 12:04 ` [PATCH 4/4] coresight: Add ETR-PERF polling Daniel Kiss
@ 2021-04-26  1:18   ` Leo Yan
  2021-05-05  7:21   ` Denis Nikitin
  1 sibling, 0 replies; 35+ messages in thread
From: Leo Yan @ 2021-04-26  1:18 UTC (permalink / raw)
  To: Daniel Kiss
  Cc: mathieu.poirier, suzuki.poulose, mike.leach, coresight,
	linux-arm-kernel, denik, Branislav Rankov

On Wed, Apr 21, 2021 at 02:04:13PM +0200, Daniel Kiss wrote:
> ETR might fill up the buffer sooner than an event makes perf to trigger
> the synchronisation especially in system wide trace. Polling runs
> periodically to sync the ETR buffer. Period is configurable via sysfs,
> disabled by default.
> 
> Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
> Signed-off-by: Branislav Rankov <Branislav.Rankov@arm.com>
> ---
>  .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
>  drivers/hwtracing/coresight/Makefile          |   2 +-
>  .../hwtracing/coresight/coresight-etm-perf.c  |   8 +
>  .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
>  .../coresight/coresight-etr-perf-polling.h    |  42 +++
>  .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
>  .../hwtracing/coresight/coresight-tmc-etr.c   |   9 +
>  7 files changed, 386 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
> index 6aa527296c710..4ca7af22a3686 100644
> --- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
> +++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
> @@ -91,3 +91,11 @@ Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
>  Description:	(RW) Size of the trace buffer for TMC-ETR when used in SYSFS
>  		mode. Writable only for TMC-ETR configurations. The value
>  		should be aligned to the kernel pagesize.
> +
> +What:		/sys/bus/coresight/devices/<memory_map>.tmc/polling/period
> +Date:		April 2021
> +KernelVersion:	5.13
> +Contact:	Daniel Kiss <daniel.kiss@arm.com>
> +Description:	(RW) Time in milliseconds when the TMC-ETR is synced.
> +		Default value is 0, means the feature is disabled.
> +		Writable only for TMC-ETR configurations.
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index d60816509755c..4df90b71d98cd 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -4,7 +4,7 @@
>  #
>  obj-$(CONFIG_CORESIGHT) += coresight.o
>  coresight-y := coresight-core.o  coresight-etm-perf.o coresight-platform.o \
> -		coresight-sysfs.o
> +		coresight-sysfs.o  coresight-etr-perf-polling.o
>  obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o
>  coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \
>  		      coresight-tmc-etr.o
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 78a55fc2bcab5..910a99944eea8 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -19,6 +19,7 @@
>  #include <linux/workqueue.h>
>  
>  #include "coresight-etm-perf.h"
> +#include "coresight-etr-perf-polling.h"
>  #include "coresight-priv.h"
>  
>  static struct pmu etm_pmu;
> @@ -438,6 +439,8 @@ static void etm_event_start(struct perf_event *event, int flags)
>  	/* Tell the perf core the event is alive */
>  	event->hw.state = 0;
>  
> +	etr_perf_polling_event_start(event, event_data, handle);
> +
>  	/* Finally enable the tracer */
>  	if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF))
>  		goto fail_disable_path;
> @@ -497,6 +500,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
>  	if (!sink)
>  		return;
>  
> +	etr_perf_polling_event_stop(event, event_data);
> +
>  	/* stop tracer */
>  	source_ops(csdev)->disable(csdev, event);
>  
> @@ -741,6 +746,8 @@ int __init etm_perf_init(void)
>  	etm_pmu.addr_filters_validate	= etm_addr_filters_validate;
>  	etm_pmu.nr_addr_filters		= ETM_ADDR_CMP_MAX;
>  
> +	etr_perf_polling_init();
> +
>  	ret = perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1);
>  	if (ret == 0)
>  		etm_perf_up = true;
> @@ -750,5 +757,6 @@ int __init etm_perf_init(void)
>  
>  void __exit etm_perf_exit(void)
>  {
> +	etr_perf_polling_exit();
>  	perf_pmu_unregister(&etm_pmu);
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-etr-perf-polling.c b/drivers/hwtracing/coresight/coresight-etr-perf-polling.c
> new file mode 100644
> index 0000000000000..aa0352908873a
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-etr-perf-polling.c
> @@ -0,0 +1,316 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright(C) 2021 Arm Limited. All rights reserved.
> + * Author: Daniel Kiss <daniel.kiss@arm.com>
> + */
> +
> +#include <linux/coresight.h>
> +#include <linux/coresight-pmu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/init.h>
> +#include <linux/list.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/perf_event.h>
> +#include <linux/slab.h>
> +#include <linux/stringhash.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>
> +
> +#include "coresight-etr-perf-polling.h"
> +#include "coresight-priv.h"
> +#include "coresight-tmc.h"
> +
> +#if defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) || \
> +    defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC_MODULE)

It's good to add a new config "CONFIG_CORESIGHT_ETM_PERF_POLL" and
developers can selectively enable it when build kernel.

> +struct polling_event_list {
> +	struct perf_event *perf_event;
> +	struct etm_event_data *etm_event_data;
> +	struct perf_output_handle *ctx_handle;
> +	void (*tmc_etr_reset_hw)(struct tmc_drvdata *);
> +	struct list_head list;
> +};
> +
> +struct polling {
> +	int cpu;
> +	struct list_head polled_events;

Based the structure definition, every CPU has its own polling
structure, and there have multiple polled event for one CPU.  In
theory, should every CPU have only one perf event for polling?

If so, it's not necessary to create event list for every CPU; in other
words, we can create a list which can be used to maintain all events
cross all CPUs.

> +	struct delayed_work delayed_work;
> +};

Every CPU has its own delayed work, a potential issue is it's hard to
synchronize within CPUs; and after the CPU number increases, the
situation will get worse.  For example, when polling for multiple CPUs,
there might have no chance to stop all tracers attached to CPUs.

I understand this patch simply captures the trace data for the first CPU
which reigstered its handler in the driver; IOW, we have no chance to
stop all tracers to record a clean tracing data (here "clean" means
there have no mixed behaviours that one CPU is reading trace data from
ETR buffer and tracers from other CPUs still write data into the ETR
buffer).

> +static atomic_t period;
> +static spinlock_t spinlock_re;
> +static struct list_head registered_events;
> +
> +static DEFINE_PER_CPU(struct polling, polling);
> +
> +static ssize_t period_show(struct device *dev, struct device_attribute *attr,
> +			   char *buf)
> +{
> +	int temp;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
> +		return -EPERM;

IMHO, the checking is redundant.

> +
> +	temp = atomic_read(&period);
> +	return sprintf(buf, "%i\n", temp);
> +}
> +
> +static ssize_t period_store(struct device *dev, struct device_attribute *attr,
> +			    const char *buf, size_t count)
> +{
> +	int temp = 0;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
> +		return -EPERM;

Ditto.

> +
> +	if ((1 == sscanf(buf, "%i", &temp)) && (temp >= 0))
> +		atomic_set(&period, temp);
> +	return count;
> +}
> +
> +static DEVICE_ATTR_RW(period);
> +
> +static struct attribute *coresight_tmc_polling_attrs[] = {
> +	&dev_attr_period.attr,
> +	NULL,
> +};
> +const struct attribute_group coresight_tmc_polling_group = {
> +	.attrs = coresight_tmc_polling_attrs,
> +	.name = "polling",
> +};
> +EXPORT_SYMBOL_GPL(coresight_tmc_polling_group);

Don't need to export the symbol.

> +
> +static inline void polling_sched_worker(struct polling *p)
> +{
> +	int tickrate = atomic_read(&period);
> +	if (!list_empty(&p->polled_events) && (tickrate > 0))
> +		schedule_delayed_work_on(p->cpu, &p->delayed_work,
> +					 msecs_to_jiffies(tickrate));
> +}
> +
> +static inline bool is_etr_related(struct etm_event_data *etm_event_data, int cpu)
> +{
> +	struct list_head *path;
> +	struct coresight_device *sink;
> +	struct tmc_drvdata *drvdata;
> +	path = etm_event_cpu_path(etm_event_data, cpu);
> +	if (WARN_ON(!path))
> +		return false;
> +	sink = coresight_get_sink(path);
> +	if (WARN_ON(!sink))
> +		return false;
> +	drvdata = dev_get_drvdata(sink->dev.parent);
> +	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
> +		return false;
> +	return true;
> +}

Understand these operations are from etm_event_start(); we can avoid
the duplicate code if arrange code more reasonable.

I suggest below flow:

  - From the function etm_setup_aux(), insert an AUX event into the
    polling list;
  - From the function etm_event_start(), enable the AUX event, so this
    AUX event will be polled periodically;
  - From the function etm_event_stop(), disable the AUX event, the
    event will not be polled anymore;
  - From the function etm_free_AUX(), remove the AUX event from the
    polling list.

> +
> +/*
> + * Adds the event to the polled events list.
> + */
> +void etr_perf_polling_event_start(struct perf_event *event,
> +				  struct etm_event_data *etm_event_data,
> +				  struct perf_output_handle *ctx_handle)
> +{
> +	int cpu = smp_processor_id();
> +	struct polling *p = per_cpu_ptr(&polling, cpu);
> +	struct polling_event_list *element;
> +	struct list_head *i, *tmp;
> +
> +	if (!is_etr_related(etm_event_data, cpu))
> +		return;
> +
> +	spin_lock(&spinlock_re);
> +	list_for_each_safe (i, tmp, &registered_events) {

Suprious space before parenthese; and better to use
list_for_each_entry_safe().

> +		element = list_entry(i, struct polling_event_list, list);
> +		if (element->ctx_handle == ctx_handle) {
> +			element->perf_event = event;
> +			element->etm_event_data = etm_event_data;
> +			list_del(&element->list);
> +			spin_unlock(&spinlock_re);
> +			list_add(&element->list, &p->polled_events);
> +			polling_sched_worker(p);
> +			return;
> +		}
> +	}
> +	spin_unlock(&spinlock_re);
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_event_start);

No need to export the symbol.

> +
> +/*
> + * Removes the event from the to be polled events list.
> + */
> +void etr_perf_polling_event_stop(struct perf_event *event,
> +				 struct etm_event_data *etm_event_data)
> +{
> +	int cpu = smp_processor_id();
> +	struct list_head *i, *tmp;
> +	struct polling *p = per_cpu_ptr(&polling, cpu);
> +
> +	if (!is_etr_related(etm_event_data, cpu))
> +		return;
> +
> +	list_for_each_safe (i, tmp, &p->polled_events) {
> +		struct polling_event_list *element =
> +			list_entry(i, struct polling_event_list, list);
> +		if (element->perf_event == event) {
> +			list_del(&element->list);
> +			element->perf_event = NULL;
> +			element->etm_event_data = NULL;
> +			spin_lock(&spinlock_re);
> +			list_add(&element->list, &registered_events);
> +			spin_unlock(&spinlock_re);
> +			if (list_empty(&p->polled_events)) {
> +				cancel_delayed_work(&p->delayed_work);
> +			}
> +			return;
> +		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_event_stop);

Ditto.

> +
> +/*
> + * The polling worker is a workqueue job which is periodically
> + * woken up to update the perf aux buffer from the etr shrink.
> + */
> +static void etr_perf_polling_worker(struct work_struct *work)
> +{
> +	unsigned long flags;
> +	int cpu = smp_processor_id();
> +	struct polling *p = per_cpu_ptr(&polling, cpu);
> +	struct list_head *i, *tmp;
> +
> +	if (!atomic_read(&period))
> +		return;
> +
> +	/*
> +	 * Scheduling would do the same from the perf hooks,
> +	 * this should be done in one go.
> +	 */
> +	local_irq_save(flags);
> +	preempt_disable();

The locking usage is questionable...

local_irq_save() will disable the local interrupt, and it also
disables preemption; after disabling interrupt, it's needless to
disable preemption.

Here neither local_irq_save() nor preempt_disable() is the right
locking to be used; alternatively, should use the pair functions
spinlock(&spinlock_re) / spin_unlock(&spinlock_re), this is because it
needs to protect the poll list.

> +	/* Perf requires rcu lock. */
> +	rcu_read_lock();

Though it gives comments, I still don't understand why need to use
rcu_read_lock().  Which critical resource it protects?

> +
> +	polling_sched_worker(p);
> +
> +	list_for_each_safe (i, tmp, &p->polled_events) {
> +		struct list_head *path;
> +		struct coresight_device *sink;
> +		struct polling_event_list *element =
> +			list_entry(i, struct polling_event_list, list);
> +
> +		path = etm_event_cpu_path(element->etm_event_data, cpu);
> +		if (WARN_ON(!path))
> +			continue;
> +		sink = coresight_get_sink(path);
> +		if (WARN_ON(!sink))
> +			continue;

When inserting the event into polling list, it have checked for path
and validate sink; so it's no need to check the path again.

> +		if (sink_ops(sink)->update_buffer) {
> +			int size, refcnt;
> +			struct tmc_drvdata *drvdata = dev_get_drvdata(sink->dev.parent);
> +
> +			/*
> +			 * Act as now we are the only users of the sink. Due to the locks
> +			 * we are safe.
> +			 */
> +			refcnt = atomic_xchg(sink->refcnt, 1);
> +			size = sink_ops(sink)->update_buffer(
> +				sink, element->ctx_handle,
> +				element->etm_event_data->snk_config);
> +			refcnt = atomic_xchg(sink->refcnt, refcnt);

This is tricky.  This change is like a workaround, it's better to
refactor the code for "sink->refcnt"; maybe we can refactor the coe
to track the sink's reference counter in the polling list rather than
in the low level driver.

> +			/*
> +			 * Restart the trace.
> +			 */
> +			if (element->tmc_etr_reset_hw)
> +				element->tmc_etr_reset_hw(drvdata);
> +
> +			WARN_ON(size < 0);
> +			if (size > 0) {
> +				struct etm_event_data *new_event_data;
> +
> +				perf_aux_output_end(element->ctx_handle, size);
> +				new_event_data = perf_aux_output_begin(
> +					element->ctx_handle,
> +					element->perf_event);
> +				if (WARN_ON(new_event_data == NULL))
> +					continue;
> +				element->etm_event_data = new_event_data;
> +				WARN_ON(new_event_data->snk_config !=
> +					element->etm_event_data->snk_config);
> +			}
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +	preempt_enable();
> +	local_irq_restore(flags);
> +}
> +
> +void etr_perf_polling_handle_register(struct perf_output_handle *handle,
> +				      void (*tmc_etr_reset_hw)(struct tmc_drvdata *drvdata))
> +{
> +	struct polling_event_list *element;
> +
> +	element = kmalloc(sizeof(*element), GFP_KERNEL);
> +	if (WARN_ON(!element))
> +		return;
> +	memset(element, 0, sizeof(*element));
> +	element->ctx_handle = handle;
> +	element->tmc_etr_reset_hw = tmc_etr_reset_hw;
> +	spin_lock(&spinlock_re);
> +	list_add(&element->list, &registered_events);
> +	spin_unlock(&spinlock_re);
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_handle_register);
> +
> +void etr_perf_polling_handle_deregister(struct perf_output_handle *handle)
> +{
> +	struct list_head *i, *tmp;
> +
> +	spin_lock(&spinlock_re);
> +	list_for_each_safe (i, tmp, &registered_events) {
> +		struct polling_event_list *element =
> +			list_entry(i, struct polling_event_list, list);
> +		if (element->ctx_handle == handle) {
> +			list_del(&element->list);
> +			spin_unlock(&spinlock_re);
> +			kfree(element);
> +			return;
> +		}
> +	}
> +	spin_unlock(&spinlock_re);
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_handle_deregister);
> +
> +void etr_perf_polling_init(void)
> +{
> +	int cpu;
> +	spin_lock_init(&spinlock_re);
> +	INIT_LIST_HEAD(&registered_events);
> +	atomic_set(&period, 0);
> +	for_each_possible_cpu (cpu) {
> +		struct polling *p = per_cpu_ptr(&polling, cpu);
> +		p->cpu = cpu;
> +		INIT_LIST_HEAD(&p->polled_events);
> +		INIT_DELAYED_WORK(&p->delayed_work, etr_perf_polling_worker);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_init);
> +
> +void etr_perf_polling_exit(void)
> +{
> +	int cpu;
> +	for_each_possible_cpu (cpu) {
> +		struct polling *p = per_cpu_ptr(&polling, cpu);
> +		cancel_delayed_work_sync(&p->delayed_work);
> +		WARN_ON(!list_empty(&p->polled_events));
> +	}
> +	WARN_ON(!list_empty(&registered_events));
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_exit);
> +
> +#endif
> diff --git a/drivers/hwtracing/coresight/coresight-etr-perf-polling.h b/drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> new file mode 100644
> index 0000000000000..5917e1fa408bb
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> @@ -0,0 +1,42 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright(C) 2021 Arm Limited. All rights reserved.
> + * Author: Daniel Kiss <daniel.kiss@arm.com>
> + */
> +
> +#ifndef _CORESIGHT_ETM_PERF_POLLING_H
> +#define _CORESIGHT_ETM_PERF_POLLING_H
> +
> +#include <linux/coresight.h>
> +#include <linux/perf_event.h>
> +#include "coresight-etm-perf.h"
> +#include "coresight-tmc.h"
> +
> +#if defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) || \
> +    defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC_MODULE)
> +
> +void etr_perf_polling_init(void);
> +void etr_perf_polling_exit(void);
> +void etr_perf_polling_handle_register(struct perf_output_handle *handle,
> +				      void (*tmc_etr_reset_hw)(struct tmc_drvdata *drvdata));
> +void etr_perf_polling_handle_deregister(struct perf_output_handle *handle);
> +void etr_perf_polling_event_start(struct perf_event *event,
> +				  struct etm_event_data *etm_event_data,
> +				  struct perf_output_handle *ctx_handle);
> +void etr_perf_polling_event_stop(struct perf_event *event,
> +				 struct etm_event_data *etm_event_data);
> +
> +extern const struct attribute_group coresight_tmc_polling_group;
> +#define CORESIGHT_TMP_POLLING_GROUP &coresight_tmc_polling_group,
> +
> +#else /* !CONFIG_CORESIGHT_LINK_AND_SINK_TMC */
> +#define etr_perf_polling_init()
> +#define etr_perf_polling_exit()
> +#define etr_perf_polling_handle_register(...)
> +#define etr_perf_polling_handle_deregister(...)
> +#define etr_perf_polling_event_start(...)
> +#define etr_perf_polling_event_stop(...)
> +#define CORESIGHT_TMP_POLLING_GROUP
> +#endif
> +
> +#endif
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c
> index 74c6323d4d6ab..51e705ef3ffa3 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-core.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c
> @@ -26,6 +26,7 @@
>  
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
> +#include "coresight-etr-perf-polling.h"
>  
>  DEFINE_CORESIGHT_DEVLIST(etb_devs, "tmc_etb");
>  DEFINE_CORESIGHT_DEVLIST(etf_devs, "tmc_etf");
> @@ -365,6 +366,7 @@ static const struct attribute_group coresight_tmc_mgmt_group = {
>  static const struct attribute_group *coresight_tmc_groups[] = {
>  	&coresight_tmc_group,
>  	&coresight_tmc_mgmt_group,
> +	CORESIGHT_TMP_POLLING_GROUP

This is a bit wired for me.  It's more readable with:

#ifdef CONFIG_CORESIGHT_ETM_PERF_POLL
        &coresight_tmc_polling_group,
#endif

>  	NULL,
>  };
>  
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index bf9f6311d8663..021b594e38e71 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -16,6 +16,7 @@
>  #include <linux/vmalloc.h>
>  #include "coresight-catu.h"
>  #include "coresight-etm-perf.h"
> +#include "coresight-etr-perf-polling.h"
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> @@ -1139,6 +1140,12 @@ void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  	drvdata->etr_buf = NULL;
>  }
>  
> +static void tmc_etr_reset_hw(struct tmc_drvdata *drvdata)
> +{
> +	__tmc_etr_disable_hw(drvdata);
> +	__tmc_etr_enable_hw(drvdata);
> +}
> +
>  static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  {
>  	int ret = 0;
> @@ -1630,6 +1637,7 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
>  		drvdata->mode = CS_MODE_PERF;
>  		drvdata->perf_buf = etr_perf->etr_buf;
>  		drvdata->perf_handle = handle;
> +		etr_perf_polling_handle_register(handle, tmc_etr_reset_hw);
>  		atomic_inc(csdev->refcnt);
>  	}
>  
> @@ -1677,6 +1685,7 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev)
>  	drvdata->mode = CS_MODE_DISABLED;
>  	/* Reset perf specific data */
>  	drvdata->perf_buf = NULL;
> +	etr_perf_polling_handle_deregister(drvdata->perf_handle);
>  	drvdata->perf_handle = NULL;
>  
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> -- 
> 2.25.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer.
  2021-04-21 12:04 ` [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer Daniel Kiss
  2021-04-23  8:23   ` Leo Yan
@ 2021-04-26 10:40   ` Suzuki K Poulose
  2021-04-27  3:45     ` Leo Yan
  1 sibling, 1 reply; 35+ messages in thread
From: Suzuki K Poulose @ 2021-04-26 10:40 UTC (permalink / raw)
  To: Daniel Kiss, mathieu.poirier, mike.leach, leo.yan, coresight,
	linux-arm-kernel
  Cc: denik, Branislav Rankov

On 21/04/2021 13:04, Daniel Kiss wrote:
> With polling the sync could called multiple times in a row. Without this
> change the data will be overwritten at the beginning of the buffer.
> 
> Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
> Signed-off-by: Branislav Rankov <Branislav.Rankov@arm.com>
> ---
>   drivers/hwtracing/coresight/coresight-tmc-etr.c | 5 +++--
>   1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index ea5027e397d02..dd19d1d1c3b38 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>   {
>   	long bytes;
>   	long pg_idx, pg_offset;
> -	unsigned long head = etr_perf->head;
> +	unsigned long head;
>   	char **dst_pages, *src_buf;
>   	struct etr_buf *etr_buf = etr_perf->etr_buf;
>   
> @@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>   		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
>   					     &src_buf);
>   		if (WARN_ON_ONCE(bytes <= 0))
> -			break;
> +			return;
>   		bytes = min(bytes, (long)(PAGE_SIZE - pg_offset));
>   
>   		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
> @@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>   		/* Move source pointers */
>   		src_offset += bytes;
>   	}
> +	etr_perf->head = (pg_idx << PAGE_SHIFT) + pg_offset;


Looking at this patch, I feel the driver is doing a couple wrong things
already.

1) We initialise etr_perf->head every time the ETR enable is called, 
irrespective of whether we actually try to enable the Hardware. e.g,

etm_0 on -> .. -> enable_etr :
etr_perf->head = <head of the handle_0>
   enable_hw()

emt_1 on -> ... -> enable_etr:
   etr_perf->head = <head of the handle_1>
   already_enabled, skip enable_hw()

etm_2 on -> ... -> enable_etr:
   etr_perf->head = <head of the handle_2>
   already_enable, skip enable_hw()...


This doesn't look correct as we don't know which handle is going to get 
the data. This looks pointless.

2) Even more problematic is where we copy the AUX buffer content to.
As mentioned above, we don't know which handle is going to be the last
one to consume and we have a "etr_perf->head" that came from one of the 
handles and the "pages" that came from the first handle which created a
etr_perf buffer. In sync_perf_buffer() we copy the hardware buffers to
the "pages" (say of handle_0) with "etr_perf->head" (which could be from
any other handle, say handle_2) and then we could return the number of 
bytes copied, which then is used to update the last handle (could be say 
handle_3), where there is no actual data copied.

To fix all of these issues, we must
1) Stop using etr_perf->head, and instead use the handle->head where we 
are called update_buffer on.

2) Keep track of the "pages" that belong to a given "handle" and then 
use those pages to copy the data to the current handle we are called to 
update the buffer on.

Did I get this wrong ?

Suzuki

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-21 12:04 [PATCH 0/4] coresight: Add ETR-PERF polling Daniel Kiss
                   ` (3 preceding siblings ...)
  2021-04-21 12:04 ` [PATCH 4/4] coresight: Add ETR-PERF polling Daniel Kiss
@ 2021-04-26 17:54 ` Mathieu Poirier
  2021-04-27 10:43   ` Al Grant
  2021-04-27 16:24 ` James Clark
  5 siblings, 1 reply; 35+ messages in thread
From: Mathieu Poirier @ 2021-04-26 17:54 UTC (permalink / raw)
  To: Daniel Kiss
  Cc: suzuki.poulose, mike.leach, leo.yan, coresight, linux-arm-kernel, denik

Hi Daniel,

On Wed, Apr 21, 2021 at 02:04:09PM +0200, Daniel Kiss wrote:
> This series adds a feature to ETR-PERF that sync the ETR buffer to perf
> periodically. This is really handy when the system wide trace is used
> because in this case the perf won't sync during the trace. In a per-thread
> setup the traced program might not go to the kernel frequvently enought
> to collect trace. Polling helps in both usecases. Can be used with strobing.
> Tuning polling period is challanging, I'm working on an additional patch
> that adds some metrics to help tune the polling period.
>

Suzuki and Leo have already commented on a number of problems with this set and
as such I will concentrate on the general idea.

Over the years we have thought long and hard about fixing the overflow issues
created by the lack of interrupt when a sink gets full, installing a timer to
empty the sink buffer at regular intervals is one of them.  Ultimately we
haven't moved forward with the idea because it requires to stop the sink when an
event is active, something that introduces more trace data loss.  

To me this kind of interval snapshot should be achieved using Mike's new
strobing feature that came bundled with the complex configuration framework,
available on next-ETE-TRBE[1].  I will rebase that branch to 5.13-rc1 when it
is released in a couple of weeks from now.

Thanks,
Mathieu

PS: Always run your work through checkpatch.pl before sending a patchset for
review. 

[1]. https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/log/?h=next-ETE-TRBE
 
> Daniel Kiss (4):
>   coresight: tmc-etr: Advance buffer pointer in sync buffer.
>   coresight: tmc-etr: Track perf handler.
>   coresight: etm-perf: Export etm_event_cpu_path.
>   coresight: Add ETR-PERF polling.
> 
>  .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
>  drivers/hwtracing/coresight/Makefile          |   2 +-
>  .../hwtracing/coresight/coresight-etm-perf.c  |  10 +-
>  .../hwtracing/coresight/coresight-etm-perf.h  |   1 +
>  .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
>  .../coresight/coresight-etr-perf-polling.h    |  42 +++
>  .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
>  .../hwtracing/coresight/coresight-tmc-etr.c   |  22 +-
>  drivers/hwtracing/coresight/coresight-tmc.h   |   2 +
>  9 files changed, 401 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> 
> -- 
> 2.25.1
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer.
  2021-04-26 10:40   ` Suzuki K Poulose
@ 2021-04-27  3:45     ` Leo Yan
  2021-04-27 10:00       ` Suzuki K Poulose
  0 siblings, 1 reply; 35+ messages in thread
From: Leo Yan @ 2021-04-27  3:45 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Daniel Kiss, mathieu.poirier, mike.leach, coresight,
	linux-arm-kernel, denik, Branislav Rankov

On Mon, Apr 26, 2021 at 11:40:44AM +0100, Suzuki Kuruppassery Poulose wrote:

[...]

> > @@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
> >   {
> >   	long bytes;
> >   	long pg_idx, pg_offset;
> > -	unsigned long head = etr_perf->head;
> > +	unsigned long head;
> >   	char **dst_pages, *src_buf;
> >   	struct etr_buf *etr_buf = etr_perf->etr_buf;
> > @@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
> >   		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
> >   					     &src_buf);
> >   		if (WARN_ON_ONCE(bytes <= 0))
> > -			break;
> > +			return;
> >   		bytes = min(bytes, (long)(PAGE_SIZE - pg_offset));
> >   		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
> > @@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
> >   		/* Move source pointers */
> >   		src_offset += bytes;
> >   	}
> > +	etr_perf->head = (pg_idx << PAGE_SHIFT) + pg_offset;
> 
> 
> Looking at this patch, I feel the driver is doing a couple wrong things
> already.
> 
> 1) We initialise etr_perf->head every time the ETR enable is called,
> irrespective of whether we actually try to enable the Hardware. e.g,
> 
> etm_0 on -> .. -> enable_etr :
> etr_perf->head = <head of the handle_0>
>   enable_hw()
> 
> emt_1 on -> ... -> enable_etr:
>   etr_perf->head = <head of the handle_1>
>   already_enabled, skip enable_hw()
> 
> etm_2 on -> ... -> enable_etr:
>   etr_perf->head = <head of the handle_2>
>   already_enable, skip enable_hw()...
> 
> 
> This doesn't look correct as we don't know which handle is going to get the
> data. This looks pointless.

I'd like to convert mapping into below diagram (for system wide trace):

  CPU0: AUX RB (perf_output_handle_0) -> etr_perf ->  +---------+
  CPU1: AUX RB (perf_output_handle_1) -> etr_perf ->  | etr_buf |
  CPU2: AUX RB (perf_output_handle_2) -> etr_perf ->  |         |
  CPU3: AUX RB (perf_output_handle_3) -> etr_perf ->  +---------+

Simply to say, there have two layers for controlling ring buffer, one
layer is for perf AUX ring buffer, it mainly uses the structure
perf_output_handle to manage the ring buffer.  And in the ETR driver,
it uses structure etr_perf to manage the header pointer for copying
data into ETR buffer (tagged as "etr_buf").

ETR buffer is the single one, but the structures "perf_output_handle"
and "etr_perf" are per CPU.  We have multiple copies for the headers and
tails to manage a single buffer, but the problem is these multiple
copies have not been synced with each other.

> 2) Even more problematic is where we copy the AUX buffer content to.
> As mentioned above, we don't know which handle is going to be the last
> one to consume and we have a "etr_perf->head" that came from one of the
> handles and the "pages" that came from the first handle which created a
> etr_perf buffer. In sync_perf_buffer() we copy the hardware buffers to
> the "pages" (say of handle_0) with "etr_perf->head" (which could be from
> any other handle, say handle_2) and then we could return the number of bytes
> copied, which then is used to update the last handle (could be say
> handle_3), where there is no actual data copied.
> 
> To fix all of these issues, we must
> 1) Stop using etr_perf->head, and instead use the handle->head where we are
> called update_buffer on.
> 
> 2) Keep track of the "pages" that belong to a given "handle" and then use
> those pages to copy the data to the current handle we are called to update
> the buffer on.

The "pages" are only allocated once, even they are attached to multiple
handles.  I think the right way is to use the single structure
"etr_perf" and single "perf_output_handle" to manage the "pages", IOW,
if there have single buffer, then we just use one copy of header and
tail to manage it.

The difficult thing is how to use the single one "perf_output_handle"
to manage the AUX ring buffer and notify to user space.  I am
wandering if we can only use CPU0's perf_output_handle to manage the
AUX ring buffer, if any other CPUs read out the data, they always use
CPU0's perf_output_handle.

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer.
  2021-04-27  3:45     ` Leo Yan
@ 2021-04-27 10:00       ` Suzuki K Poulose
  2021-04-28  2:34         ` Leo Yan
  0 siblings, 1 reply; 35+ messages in thread
From: Suzuki K Poulose @ 2021-04-27 10:00 UTC (permalink / raw)
  To: Leo Yan
  Cc: Daniel Kiss, mathieu.poirier, mike.leach, coresight,
	linux-arm-kernel, denik, Branislav Rankov

On 27/04/2021 04:45, Leo Yan wrote:
> On Mon, Apr 26, 2021 at 11:40:44AM +0100, Suzuki Kuruppassery Poulose wrote:
> 
> [...]
> 
>>> @@ -1442,7 +1442,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>>    {
>>>    	long bytes;
>>>    	long pg_idx, pg_offset;
>>> -	unsigned long head = etr_perf->head;
>>> +	unsigned long head;
>>>    	char **dst_pages, *src_buf;
>>>    	struct etr_buf *etr_buf = etr_perf->etr_buf;
>>> @@ -1465,7 +1465,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>>    		bytes = tmc_etr_buf_get_data(etr_buf, src_offset, to_copy,
>>>    					     &src_buf);
>>>    		if (WARN_ON_ONCE(bytes <= 0))
>>> -			break;
>>> +			return;
>>>    		bytes = min(bytes, (long)(PAGE_SIZE - pg_offset));
>>>    		memcpy(dst_pages[pg_idx] + pg_offset, src_buf, bytes);
>>> @@ -1483,6 +1483,7 @@ static void tmc_etr_sync_perf_buffer(struct etr_perf_buffer *etr_perf,
>>>    		/* Move source pointers */
>>>    		src_offset += bytes;
>>>    	}
>>> +	etr_perf->head = (pg_idx << PAGE_SHIFT) + pg_offset;
>>
>>
>> Looking at this patch, I feel the driver is doing a couple wrong things
>> already.
>>
>> 1) We initialise etr_perf->head every time the ETR enable is called,
>> irrespective of whether we actually try to enable the Hardware. e.g,
>>
>> etm_0 on -> .. -> enable_etr :
>> etr_perf->head = <head of the handle_0>
>>    enable_hw()
>>
>> emt_1 on -> ... -> enable_etr:
>>    etr_perf->head = <head of the handle_1>
>>    already_enabled, skip enable_hw()
>>
>> etm_2 on -> ... -> enable_etr:
>>    etr_perf->head = <head of the handle_2>
>>    already_enable, skip enable_hw()...
>>
>>
>> This doesn't look correct as we don't know which handle is going to get the
>> data. This looks pointless.
> 
> I'd like to convert mapping into below diagram (for system wide trace):
> 
>    CPU0: AUX RB (perf_output_handle_0) -> etr_perf ->  +---------+
>    CPU1: AUX RB (perf_output_handle_1) -> etr_perf ->  | etr_buf |
>    CPU2: AUX RB (perf_output_handle_2) -> etr_perf ->  |         |
>    CPU3: AUX RB (perf_output_handle_3) -> etr_perf ->  +---------+ >

To make it more clear:

     CPU0: AUX RB (perf_output_handle_0) -> etr_perf0 ->  +---------+
     CPU1: AUX RB (perf_output_handle_1) -> etr_perf1 ->  |etr_buf0 |
     CPU2: AUX RB (perf_output_handle_2) -> etr_perf2 ->  |         |
     CPU3: AUX RB (perf_output_handle_3) -> etr_perf3 ->  +---------+


> Simply to say, there have two layers for controlling ring buffer, one
> layer is for perf AUX ring buffer, it mainly uses the structure
> perf_output_handle to manage the ring buffer.  And in the ETR driver,
> it uses structure etr_perf to manage the header pointer for copying
> data into ETR buffer (tagged as "etr_buf").
> 
> ETR buffer is the single one, but the structures "perf_output_handle"
> and "etr_perf" are per CPU.  We have multiple copies for the headers and

minor Correction, they are "per-event" to be precise. And there are 
events per-CPU in a system wide mode or task mode (but not per-thread 
mode). So, you are correct

> tails to manage a single buffer, but the problem is these multiple
> copies have not been synced with each other.
> 
>> 2) Even more problematic is where we copy the AUX buffer content to.
>> As mentioned above, we don't know which handle is going to be the last
>> one to consume and we have a "etr_perf->head" that came from one of the
>> handles and the "pages" that came from the first handle which created a
>> etr_perf buffer. In sync_perf_buffer() we copy the hardware buffers to
>> the "pages" (say of handle_0) with "etr_perf->head" (which could be from
>> any other handle, say handle_2) and then we could return the number of bytes
>> copied, which then is used to update the last handle (could be say
>> handle_3), where there is no actual data copied.

This is not valid and am relieved that the driver is correct. The 
assumption that there is only one etr_perf per ETR is incorrect as
pictured above.

>>
>> To fix all of these issues, we must
>> 1) Stop using etr_perf->head, and instead use the handle->head where we are
>> called update_buffer on.
>>
>> 2) Keep track of the "pages" that belong to a given "handle" and then use
>> those pages to copy the data to the current handle we are called to update
>> the buffer on.
> 
> The "pages" are only allocated once, even they are attached to multiple
> handles.  I think the right way is to use the single structure

I assume you mean the pages in the etr_buf and not etr_perf right ?

> "etr_perf" and single "perf_output_handle" to manage the "pages", IOW,
> if there have single buffer, then we just use one copy of header and
> tail to manage it.

I think this is not needed and the way we do things are fine and the 
patch as such looks correct to me.

The perf_output_handle is per-event and nothing that we can combine 
with. etr_perf captures what the "ouput_handle" stands for and is 
something necessary for syncing the buffer.

Now coming back to this patch, I understand that the sync_perf could be 
called with the polling patches multiple times. But don't we do a
perf_output_handle_end() each of the time we wake up ? (I haven't looked
at the later patches yet).

I would expect:

   perf_aux_output_begin() -> update the etr_perf-> head

   when we sync the buffer, we do :

  Poll-> sync_buffer-> perf_aux_output_end() and perf_aux_output_begin() 
-> update etr_perf->head.


Kind regards
Suzuki


_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* RE: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-26 17:54 ` [PATCH 0/4] " Mathieu Poirier
@ 2021-04-27 10:43   ` Al Grant
  2021-04-27 14:41     ` Mike Leach
  0 siblings, 1 reply; 35+ messages in thread
From: Al Grant @ 2021-04-27 10:43 UTC (permalink / raw)
  To: Mathieu Poirier, Daniel Kiss
  Cc: coresight, denik, linux-arm-kernel, mike.leach

> Hi Daniel,
> 
> On Wed, Apr 21, 2021 at 02:04:09PM +0200, Daniel Kiss wrote:
> > This series adds a feature to ETR-PERF that sync the ETR buffer to
> > perf periodically. This is really handy when the system wide trace is
> > used because in this case the perf won't sync during the trace. In a
> > per-thread setup the traced program might not go to the kernel
> > frequvently enought to collect trace. Polling helps in both usecases. Can be
> used with strobing.
> > Tuning polling period is challanging, I'm working on an additional
> > patch that adds some metrics to help tune the polling period.
> >
> 
> Suzuki and Leo have already commented on a number of problems with this set
> and as such I will concentrate on the general idea.
> 
> Over the years we have thought long and hard about fixing the overflow issues
> created by the lack of interrupt when a sink gets full, installing a timer to empty
> the sink buffer at regular intervals is one of them.  Ultimately we haven't moved
> forward with the idea because it requires to stop the sink when an event is
> active, something that introduces more trace data loss.
> 
> To me this kind of interval snapshot should be achieved using Mike's new
> strobing feature that came bundled with the complex configuration framework,
> available on next-ETE-TRBE[1].  I will rebase that branch to 5.13-rc1 when it is
> released in a couple of weeks from now.

It's important to understand what strobing is. It acts internally to the ETM
and switches the ETM on for a time and then off for a time. It is as the
name suggests, like a stroboscope (or a lighthouse).

There is no synchronization between the on-periods of different ETMs.
When you have multiple ETMs funnelling into a common ETR, strobing
does not guarantee you a window where you can safely harvest the buffer.
It achieves a reduction in the overall bandwidth of trace being dumped
into the buffer, and there may be times when no trace is being written
at all because all the ETMs are in their off-period.

At worst, it may create a false sense of security - tests that consistently
fail without strobing, may pass often enough with strobing to create the
impression that strobing has solved the problem. But these tests are also
likely to fail eventually with strobing. To fix this problem without
disabling either ETR or ETMs you would have to guarantee that you can
harvest the ETR buffer in less time than it takes to fill it. That would need 
very careful quntitative arguments to be made about:

 - the rate of trace generation by each ETM (as modified by strobing)

 - the number of ETMs writing into the buffer

 - the time available to the kernel to harvest the buffer

So if there are 10 ETMs generating trace at average 1Gb/s into a 1Mb
buffer, the buffer will fill in 100us, and that gives the kernel 100us to
harvest the buffer before its read pointer is caught up by the ETR's
advancing write pointer. If strobing is used to reduce average ETM rate
to 100Mb/s the kernel has 1ms to read the buffer, and so on. In short
the kernel must *guarantee* a minimum readout rate equal to the
maximum aggregate write rate of the ETMs. But can the kernel
guarantee any minimum readout rate at all?

The alternative would be double-buffering the ETR, which we've 
also discussed - so while the kernel is harvesting the contents of one
buffer, the ETR is writing (and possibly wrapping) the other.
Some trace will still be lost but it does mean the kernel will be
harvesting monotonically increasing sequences of trace, and won't be
seeing artefacts from its reads colliding with the ETR's writes.

Al


> 
> Thanks,
> Mathieu
> 
> PS: Always run your work through checkpatch.pl before sending a patchset for
> review.
> 
> [1].
> https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/log/?h=next-
> ETE-TRBE
> 
> > Daniel Kiss (4):
> >   coresight: tmc-etr: Advance buffer pointer in sync buffer.
> >   coresight: tmc-etr: Track perf handler.
> >   coresight: etm-perf: Export etm_event_cpu_path.
> >   coresight: Add ETR-PERF polling.
> >
> >  .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
> >  drivers/hwtracing/coresight/Makefile          |   2 +-
> >  .../hwtracing/coresight/coresight-etm-perf.c  |  10 +-
> >  .../hwtracing/coresight/coresight-etm-perf.h  |   1 +
> >  .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
> >  .../coresight/coresight-etr-perf-polling.h    |  42 +++
> >  .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
> >  .../hwtracing/coresight/coresight-tmc-etr.c   |  22 +-
> >  drivers/hwtracing/coresight/coresight-tmc.h   |   2 +
> >  9 files changed, 401 insertions(+), 4 deletions(-)  create mode
> > 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
> >  create mode 100644
> > drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> >
> > --
> > 2.25.1
> >
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-27 10:43   ` Al Grant
@ 2021-04-27 14:41     ` Mike Leach
  2021-04-27 15:47       ` Mathieu Poirier
  0 siblings, 1 reply; 35+ messages in thread
From: Mike Leach @ 2021-04-27 14:41 UTC (permalink / raw)
  To: Mathieu Poirier, coresight; +Cc: Al Grant, Daniel Kiss, denik, linux-arm-kernel

Hi Mathieu,

I thought I'd add a little backgound to what has been said so far...

On Tue, 27 Apr 2021 at 11:43, Al Grant <Al.Grant@arm.com> wrote:
>
> > Hi Daniel,
> >
> > On Wed, Apr 21, 2021 at 02:04:09PM +0200, Daniel Kiss wrote:
> > > This series adds a feature to ETR-PERF that sync the ETR buffer to
> > > perf periodically. This is really handy when the system wide trace is
> > > used because in this case the perf won't sync during the trace. In a
> > > per-thread setup the traced program might not go to the kernel
> > > frequvently enought to collect trace. Polling helps in both usecases. Can be
> > used with strobing.
> > > Tuning polling period is challanging, I'm working on an additional
> > > patch that adds some metrics to help tune the polling period.
> > >
> >
> > Suzuki and Leo have already commented on a number of problems with this set
> > and as such I will concentrate on the general idea.
> >
> > Over the years we have thought long and hard about fixing the overflow issues
> > created by the lack of interrupt when a sink gets full, installing a timer to empty
> > the sink buffer at regular intervals is one of them.  Ultimately we haven't moved
> > forward with the idea because it requires to stop the sink when an event is
> > active, something that introduces more trace data loss.
> >
> > To me this kind of interval snapshot should be achieved using Mike's new
> > strobing feature that came bundled with the complex configuration framework,
> > available on next-ETE-TRBE[1].  I will rebase that branch to 5.13-rc1 when it is
> > released in a couple of weeks from now.
>
> It's important to understand what strobing is. It acts internally to the ETM
> and switches the ETM on for a time and then off for a time. It is as the
> name suggests, like a stroboscope (or a lighthouse).
>
> There is no synchronization between the on-periods of different ETMs.
> When you have multiple ETMs funnelling into a common ETR, strobing
> does not guarantee you a window where you can safely harvest the buffer.
> It achieves a reduction in the overall bandwidth of trace being dumped
> into the buffer, and there may be times when no trace is being written
> at all because all the ETMs are in their off-period.
>
> At worst, it may create a false sense of security - tests that consistently
> fail without strobing, may pass often enough with strobing to create the
> impression that strobing has solved the problem. But these tests are also
> likely to fail eventually with strobing. To fix this problem without
> disabling either ETR or ETMs you would have to guarantee that you can
> harvest the ETR buffer in less time than it takes to fill it. That would need
> very careful quntitative arguments to be made about:
>
>  - the rate of trace generation by each ETM (as modified by strobing)
>
>  - the number of ETMs writing into the buffer
>
>  - the time available to the kernel to harvest the buffer
>
> So if there are 10 ETMs generating trace at average 1Gb/s into a 1Mb
> buffer, the buffer will fill in 100us, and that gives the kernel 100us to
> harvest the buffer before its read pointer is caught up by the ETR's
> advancing write pointer. If strobing is used to reduce average ETM rate
> to 100Mb/s the kernel has 1ms to read the buffer, and so on. In short
> the kernel must *guarantee* a minimum readout rate equal to the
> maximum aggregate write rate of the ETMs. But can the kernel
> guarantee any minimum readout rate at all?
>
> The alternative would be double-buffering the ETR, which we've
> also discussed - so while the kernel is harvesting the contents of one
> buffer, the ETR is writing (and possibly wrapping) the other.
> Some trace will still be lost but it does mean the kernel will be
> harvesting monotonically increasing sequences of trace, and won't be
> seeing artefacts from its reads colliding with the ETR's writes.
>
> Al
>

As Al mentions, ETR polling is designed to solve a different issue
than ETM strobing.  These two techniques can be used together or
separately.

It was noticed by users that the amount of trace captured during a
given trace run would vary greatly even when tracing the same
application for the same length of time.
This was also found to be sensitive to process scheduling - frequent
re-scheds did seem to result in more frequent ETR updates and more
trace data collected. If perf does not wake up during a trace run then
the ETR may wrap mulitple times and all the data  will be a single
buffer biased towards the end of the trace session.

ETR polling is designed to ensure that more trace data is collected
consistently across the whole of the trace session. There are issues
of course, with stopping collection without stopping the sources. -
shared to some extent by the ETE / TRBE combination.
This can result in incomplete packets and other trace discontinuities.
For this reason it is necessary to ensure that the decoder is
restarted for each block of trace captured  - which is where the patch
set from James that does this using AUX records in perf to correctly
split the AUXTRACE records into valid blocks is needed.

In summary:-
1) ETM strobing samples trace to allow greater coverage of the program
being traced for a given buffer. This is useful when building
statistical profiles such as for AutoFDO
2) ETR polling ensures that more trace is collected across the entire
trace session - seeking to reduce inconsistent capture volumes.
3) Use AUX records to split the AUXTRACE buffer into valid capture
blocks and reset the decoder at the start of these blocks. This is
essential for ETE+TRBE, the ETR polling, and systems where we are
seeing hardware errata around the flush process causing similar
spurious packets. (an alternative for the ETR polling / flush errata
might be to insert barrier packets to force a decoder reset for every
ETR block copied to the perf buffer - but this does not work for
ETE/TRBE that uses no CoreSight formatted framing).

Regards

Mike


>
> >
> > Thanks,
> > Mathieu
> >
> > PS: Always run your work through checkpatch.pl before sending a patchset for
> > review.
> >
> > [1].
> > https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/log/?h=next-
> > ETE-TRBE
> >
> > > Daniel Kiss (4):
> > >   coresight: tmc-etr: Advance buffer pointer in sync buffer.
> > >   coresight: tmc-etr: Track perf handler.
> > >   coresight: etm-perf: Export etm_event_cpu_path.
> > >   coresight: Add ETR-PERF polling.
> > >
> > >  .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
> > >  drivers/hwtracing/coresight/Makefile          |   2 +-
> > >  .../hwtracing/coresight/coresight-etm-perf.c  |  10 +-
> > >  .../hwtracing/coresight/coresight-etm-perf.h  |   1 +
> > >  .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
> > >  .../coresight/coresight-etr-perf-polling.h    |  42 +++
> > >  .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
> > >  .../hwtracing/coresight/coresight-tmc-etr.c   |  22 +-
> > >  drivers/hwtracing/coresight/coresight-tmc.h   |   2 +
> > >  9 files changed, 401 insertions(+), 4 deletions(-)  create mode
> > > 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
> > >  create mode 100644
> > > drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> > >
> > > --
> > > 2.25.1
> > >
> > _______________________________________________
> > CoreSight mailing list
> > CoreSight@lists.linaro.org
> > https://lists.linaro.org/mailman/listinfo/coresight



--
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-27 14:41     ` Mike Leach
@ 2021-04-27 15:47       ` Mathieu Poirier
  2021-04-27 16:04         ` Leo Yan
  0 siblings, 1 reply; 35+ messages in thread
From: Mathieu Poirier @ 2021-04-27 15:47 UTC (permalink / raw)
  To: Mike Leach; +Cc: coresight, Al Grant, Daniel Kiss, denik, linux-arm-kernel

Good day Mike,

On Tue, Apr 27, 2021 at 03:41:01PM +0100, Mike Leach wrote:
> Hi Mathieu,
> 
> I thought I'd add a little backgound to what has been said so far...
> 
> On Tue, 27 Apr 2021 at 11:43, Al Grant <Al.Grant@arm.com> wrote:
> >
> > > Hi Daniel,
> > >
> > > On Wed, Apr 21, 2021 at 02:04:09PM +0200, Daniel Kiss wrote:
> > > > This series adds a feature to ETR-PERF that sync the ETR buffer to
> > > > perf periodically. This is really handy when the system wide trace is
> > > > used because in this case the perf won't sync during the trace. In a
> > > > per-thread setup the traced program might not go to the kernel
> > > > frequvently enought to collect trace. Polling helps in both usecases. Can be
> > > used with strobing.
> > > > Tuning polling period is challanging, I'm working on an additional
> > > > patch that adds some metrics to help tune the polling period.
> > > >
> > >
> > > Suzuki and Leo have already commented on a number of problems with this set
> > > and as such I will concentrate on the general idea.
> > >
> > > Over the years we have thought long and hard about fixing the overflow issues
> > > created by the lack of interrupt when a sink gets full, installing a timer to empty
> > > the sink buffer at regular intervals is one of them.  Ultimately we haven't moved
> > > forward with the idea because it requires to stop the sink when an event is
> > > active, something that introduces more trace data loss.
> > >
> > > To me this kind of interval snapshot should be achieved using Mike's new
> > > strobing feature that came bundled with the complex configuration framework,
> > > available on next-ETE-TRBE[1].  I will rebase that branch to 5.13-rc1 when it is
> > > released in a couple of weeks from now.
> >
> > It's important to understand what strobing is. It acts internally to the ETM
> > and switches the ETM on for a time and then off for a time. It is as the
> > name suggests, like a stroboscope (or a lighthouse).
> >
> > There is no synchronization between the on-periods of different ETMs.
> > When you have multiple ETMs funnelling into a common ETR, strobing
> > does not guarantee you a window where you can safely harvest the buffer.
> > It achieves a reduction in the overall bandwidth of trace being dumped
> > into the buffer, and there may be times when no trace is being written
> > at all because all the ETMs are in their off-period.
> >
> > At worst, it may create a false sense of security - tests that consistently
> > fail without strobing, may pass often enough with strobing to create the
> > impression that strobing has solved the problem. But these tests are also
> > likely to fail eventually with strobing. To fix this problem without
> > disabling either ETR or ETMs you would have to guarantee that you can
> > harvest the ETR buffer in less time than it takes to fill it. That would need
> > very careful quntitative arguments to be made about:
> >
> >  - the rate of trace generation by each ETM (as modified by strobing)
> >
> >  - the number of ETMs writing into the buffer
> >
> >  - the time available to the kernel to harvest the buffer
> >
> > So if there are 10 ETMs generating trace at average 1Gb/s into a 1Mb
> > buffer, the buffer will fill in 100us, and that gives the kernel 100us to
> > harvest the buffer before its read pointer is caught up by the ETR's
> > advancing write pointer. If strobing is used to reduce average ETM rate
> > to 100Mb/s the kernel has 1ms to read the buffer, and so on. In short
> > the kernel must *guarantee* a minimum readout rate equal to the
> > maximum aggregate write rate of the ETMs. But can the kernel
> > guarantee any minimum readout rate at all?
> >
> > The alternative would be double-buffering the ETR, which we've
> > also discussed - so while the kernel is harvesting the contents of one
> > buffer, the ETR is writing (and possibly wrapping) the other.
> > Some trace will still be lost but it does mean the kernel will be
> > harvesting monotonically increasing sequences of trace, and won't be
> > seeing artefacts from its reads colliding with the ETR's writes.
> >
> > Al
> >
> 
> As Al mentions, ETR polling is designed to solve a different issue
> than ETM strobing.  These two techniques can be used together or
> separately.
> 
> It was noticed by users that the amount of trace captured during a
> given trace run would vary greatly even when tracing the same
> application for the same length of time.

Indeed, that problem is well known.

> This was also found to be sensitive to process scheduling - frequent
> re-scheds did seem to result in more frequent ETR updates and more
> trace data collected. If perf does not wake up during a trace run then
> the ETR may wrap mulitple times and all the data  will be a single
> buffer biased towards the end of the trace session.
> 

Right.

> ETR polling is designed to ensure that more trace data is collected
> consistently across the whole of the trace session. There are issues
> of course, with stopping collection without stopping the sources. -
> shared to some extent by the ETE / TRBE combination.
> This can result in incomplete packets and other trace discontinuities.
> For this reason it is necessary to ensure that the decoder is
> restarted for each block of trace captured  - which is where the patch
> set from James that does this using AUX records in perf to correctly
> split the AUXTRACE records into valid blocks is needed.

I am still waiting for a new revision from James.

> 
> In summary:-
> 1) ETM strobing samples trace to allow greater coverage of the program
> being traced for a given buffer. This is useful when building
> statistical profiles such as for AutoFDO
> 2) ETR polling ensures that more trace is collected across the entire
> trace session - seeking to reduce inconsistent capture volumes.

I am not convinced disabling a sink to collect traces while an
event is active is the right way to go.  To me it will add (more) complexity to
the coresight subsystem for very little gains, if any.

If I remember correctly Leo brought forward the exact same idea about a year ago
and after discussion, we all agreed the benefit would not be important enough to
offset the drawbacks.

As usual I am open to discussion and my opinion is not set in stone.  But as I
mentioned I worry the feature will increase complexity in the driver and
produce dubious results.  And we also have to factor in usability which, as
Al pointed, out will be a problem. 

> 3) Use AUX records to split the AUXTRACE buffer into valid capture
> blocks and reset the decoder at the start of these blocks. This is
> essential for ETE+TRBE, the ETR polling, and systems where we are
> seeing hardware errata around the flush process causing similar
> spurious packets. (an alternative for the ETR polling / flush errata
> might be to insert barrier packets to force a decoder reset for every
> ETR block copied to the perf buffer - but this does not work for
> ETE/TRBE that uses no CoreSight formatted framing).
> 
> Regards
> 
> Mike
> 
> 
> >
> > >
> > > Thanks,
> > > Mathieu
> > >
> > > PS: Always run your work through checkpatch.pl before sending a patchset for
> > > review.
> > >
> > > [1].
> > > https://git.kernel.org/pub/scm/linux/kernel/git/coresight/linux.git/log/?h=next-
> > > ETE-TRBE
> > >
> > > > Daniel Kiss (4):
> > > >   coresight: tmc-etr: Advance buffer pointer in sync buffer.
> > > >   coresight: tmc-etr: Track perf handler.
> > > >   coresight: etm-perf: Export etm_event_cpu_path.
> > > >   coresight: Add ETR-PERF polling.
> > > >
> > > >  .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
> > > >  drivers/hwtracing/coresight/Makefile          |   2 +-
> > > >  .../hwtracing/coresight/coresight-etm-perf.c  |  10 +-
> > > >  .../hwtracing/coresight/coresight-etm-perf.h  |   1 +
> > > >  .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
> > > >  .../coresight/coresight-etr-perf-polling.h    |  42 +++
> > > >  .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
> > > >  .../hwtracing/coresight/coresight-tmc-etr.c   |  22 +-
> > > >  drivers/hwtracing/coresight/coresight-tmc.h   |   2 +
> > > >  9 files changed, 401 insertions(+), 4 deletions(-)  create mode
> > > > 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
> > > >  create mode 100644
> > > > drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> > > >
> > > > --
> > > > 2.25.1
> > > >
> > > _______________________________________________
> > > CoreSight mailing list
> > > CoreSight@lists.linaro.org
> > > https://lists.linaro.org/mailman/listinfo/coresight
> 
> 
> 
> --
> Mike Leach
> Principal Engineer, ARM Ltd.
> Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-27 15:47       ` Mathieu Poirier
@ 2021-04-27 16:04         ` Leo Yan
  2021-05-05  6:46           ` Denis Nikitin
  0 siblings, 1 reply; 35+ messages in thread
From: Leo Yan @ 2021-04-27 16:04 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Mike Leach, coresight, denik, linux-arm-kernel, Daniel Kiss

On Tue, Apr 27, 2021 at 09:47:46AM -0600, Mathieu Poirier wrote:

[...]

> > 2) ETR polling ensures that more trace is collected across the entire
> > trace session - seeking to reduce inconsistent capture volumes.
> 
> I am not convinced disabling a sink to collect traces while an
> event is active is the right way to go.  To me it will add (more) complexity to
> the coresight subsystem for very little gains, if any.
> 
> If I remember correctly Leo brought forward the exact same idea about a year ago
> and after discussion, we all agreed the benefit would not be important enough to
> offset the drawbacks.
> 
> As usual I am open to discussion and my opinion is not set in stone.  But as I
> mentioned I worry the feature will increase complexity in the driver and
> produce dubious results.  And we also have to factor in usability which, as
> Al pointed, out will be a problem. 

Just want to remind one thing for ETR polling.  From one perspective,
the ETR polling mode is actually very similar with perf's snapshot
mode.  E.g. we can use specific interval to send USR2 singal to perf
tool to captcure CoreSight trace data, thus it also can record the
trace data continuously.

I can see a benefit from ETR polling mode is it might introduce less
overhead than perf snapshot mode.  The kernel's mechanism (workqueue
or kernel thread) will be much efficiency than perf's signal handling
+ SMP call with IPIs.

So it's good to firstly understand if perf snapshot mode can meet the
requirement or not.

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-21 12:04 [PATCH 0/4] coresight: Add ETR-PERF polling Daniel Kiss
                   ` (4 preceding siblings ...)
  2021-04-26 17:54 ` [PATCH 0/4] " Mathieu Poirier
@ 2021-04-27 16:24 ` James Clark
  2021-04-28 11:30   ` James Clark
  2021-04-28 11:52   ` Daniel Kiss
  5 siblings, 2 replies; 35+ messages in thread
From: James Clark @ 2021-04-27 16:24 UTC (permalink / raw)
  To: Daniel Kiss, mathieu.poirier, suzuki.poulose, mike.leach,
	leo.yan, coresight, linux-arm-kernel
  Cc: denik



On 21/04/2021 15:04, Daniel Kiss wrote:
> This series adds a feature to ETR-PERF that sync the ETR buffer to perf
> periodically. This is really handy when the system wide trace is used
> because in this case the perf won't sync during the trace. In a per-thread
> setup the traced program might not go to the kernel frequvently enought
> to collect trace. Polling helps in both usecases. Can be used with strobing.
> Tuning polling period is challanging, I'm working on an additional patch
> that adds some metrics to help tune the polling period.

Hi Daniel,

Is the expectation that leaving the polling period at 0 should have no affect
on the amount of data collected vs not using this patch?

I've been running the octane 2 benchmark on Chrome on Juno and get these results:

  No patch:             I consistently get 130MB of trace across dozens of runs.
  Patch, polling = 0:   Run 1 - 6.1MB
                        Run 2 - 4.7MB
  Patch, polling = 100: 1600MB


Thanks
James

> 
> Daniel Kiss (4):
>   coresight: tmc-etr: Advance buffer pointer in sync buffer.
>   coresight: tmc-etr: Track perf handler.
>   coresight: etm-perf: Export etm_event_cpu_path.
>   coresight: Add ETR-PERF polling.
> 
>  .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
>  drivers/hwtracing/coresight/Makefile          |   2 +-
>  .../hwtracing/coresight/coresight-etm-perf.c  |  10 +-
>  .../hwtracing/coresight/coresight-etm-perf.h  |   1 +
>  .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
>  .../coresight/coresight-etr-perf-polling.h    |  42 +++
>  .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
>  .../hwtracing/coresight/coresight-tmc-etr.c   |  22 +-
>  drivers/hwtracing/coresight/coresight-tmc.h   |   2 +
>  9 files changed, 401 insertions(+), 4 deletions(-)
>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer.
  2021-04-27 10:00       ` Suzuki K Poulose
@ 2021-04-28  2:34         ` Leo Yan
  0 siblings, 0 replies; 35+ messages in thread
From: Leo Yan @ 2021-04-28  2:34 UTC (permalink / raw)
  To: Suzuki K Poulose
  Cc: Daniel Kiss, mathieu.poirier, mike.leach, coresight,
	linux-arm-kernel, denik, Branislav Rankov

On Tue, Apr 27, 2021 at 11:00:51AM +0100, Suzuki Kuruppassery Poulose wrote:

[...]

> To make it more clear:
> 
>     CPU0: AUX RB (perf_output_handle_0) -> etr_perf0 ->  +---------+
>     CPU1: AUX RB (perf_output_handle_1) -> etr_perf1 ->  |etr_buf0 |
>     CPU2: AUX RB (perf_output_handle_2) -> etr_perf2 ->  |         |
>     CPU3: AUX RB (perf_output_handle_3) -> etr_perf3 ->  +---------+

Thanks for the clarification.

> > Simply to say, there have two layers for controlling ring buffer, one
> > layer is for perf AUX ring buffer, it mainly uses the structure
> > perf_output_handle to manage the ring buffer.  And in the ETR driver,
> > it uses structure etr_perf to manage the header pointer for copying
> > data into ETR buffer (tagged as "etr_buf").
> > 
> > ETR buffer is the single one, but the structures "perf_output_handle"
> > and "etr_perf" are per CPU.  We have multiple copies for the headers and
> 
> minor Correction, they are "per-event" to be precise. And there are events
> per-CPU in a system wide mode or task mode (but not per-thread mode). So,
> you are correct
> 
> > tails to manage a single buffer, but the problem is these multiple
> > copies have not been synced with each other.
> > 
> > > 2) Even more problematic is where we copy the AUX buffer content to.
> > > As mentioned above, we don't know which handle is going to be the last
> > > one to consume and we have a "etr_perf->head" that came from one of the
> > > handles and the "pages" that came from the first handle which created a
> > > etr_perf buffer. In sync_perf_buffer() we copy the hardware buffers to
> > > the "pages" (say of handle_0) with "etr_perf->head" (which could be from
> > > any other handle, say handle_2) and then we could return the number of bytes
> > > copied, which then is used to update the last handle (could be say
> > > handle_3), where there is no actual data copied.
> 
> This is not valid and am relieved that the driver is correct. The assumption
> that there is only one etr_perf per ETR is incorrect as
> pictured above.

Yeah, etr_perf is per event wise.

> > > To fix all of these issues, we must
> > > 1) Stop using etr_perf->head, and instead use the handle->head where we are
> > > called update_buffer on.
> > > 
> > > 2) Keep track of the "pages" that belong to a given "handle" and then use
> > > those pages to copy the data to the current handle we are called to update
> > > the buffer on.
> > 
> > The "pages" are only allocated once, even they are attached to multiple
> > handles.  I think the right way is to use the single structure
> 
> I assume you mean the pages in the etr_buf and not etr_perf right ?

Yes.

> > "etr_perf" and single "perf_output_handle" to manage the "pages", IOW,
> > if there have single buffer, then we just use one copy of header and
> > tail to manage it.
> 
> I think this is not needed and the way we do things are fine and the patch
> as such looks correct to me.
> 
> The perf_output_handle is per-event and nothing that we can combine with.
> etr_perf captures what the "ouput_handle" stands for and is something
> necessary for syncing the buffer.

Correct myself for one thing.  At beginning I wrongly understood the AUX
buffer is only allocated once but mapped into VMA for multiple times for all
events, but this is wrong.  In the file kernel/events/ring_buffer.c,
function rb_alloc_aux() clearly shows the AUX ring buffer is allocated
for every event.  So the buffer management is as below:

        +---------+
  CPU0: | AUX RB0 | (perf_output_handle_0) -> etr_perf0 ->  +---------+
        +---------+                                         |         |
  CPU1: | AUX RB1 | (perf_output_handle_1) -> etr_perf1 ->  |etr_buf0 |
        +---------+                                         |         |
  CPU2: | AUX RB2 | (perf_output_handle_2) -> etr_perf2 ->  |         |
        +---------+                                         |         |
  CPU3: | AUX RB3 | (perf_output_handle_3) -> etr_perf3 ->  +---------+
        +---------+                                              |
             ^---------------------------------------------------/
                           tmc_etr_sync_perf_buffer()

> Now coming back to this patch, I understand that the sync_perf could be
> called with the polling patches multiple times. But don't we do a
> perf_output_handle_end() each of the time we wake up ? (I haven't looked
> at the later patches yet).

Here "we" means the polling's work (or we can say the polling thread).
From my understanding for the patch of implementation the polling, it
adds the delayed work on work queue periodically, every time the work
resets the ETR and sync the trace data from etr_buf0 to AUX ring
buffer.  The target AUX ring buffer is fixed, it should be the first
registered AUX ring buffer when launch the session.

So the flow is:

  etr_perf_polling_worker()
    sink_ops(sink)->update_buffer()
      tmc_etr_sync_perf_buffer() -> Update 'etr_perf->head'
    tmc_etr_reset
    perf_aux_output_end(handle, size)  -> Update AUX ring buffer head
                                          and notify the perf tool
    perf_aux_output_begin()            -> Prepare for next recording

> I would expect:
> 
>   perf_aux_output_begin() -> update the etr_perf-> head
> 
>   when we sync the buffer, we do :
> 
>  Poll-> sync_buffer-> perf_aux_output_end() and perf_aux_output_begin() ->
> update etr_perf->head.

I understand your meaning, essentially "etr_perf->head" should keep
up the latest value of the AUX ring buffer's head.  So I think below
change is what you are proposing:

diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
index 8e8596dc75fa..daeeda00236e 100644
--- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
+++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
@@ -1567,6 +1567,7 @@ tmc_update_etr_buffer(struct coresight_device *csdev,
 	/* Insert barrier packets at the beginning, if there was an overflow */
 	if (lost)
 		tmc_etr_buf_insert_barrier_packet(etr_buf, offset);
+	etr_perf->head = handle->head;
 	tmc_etr_sync_perf_buffer(etr_perf, offset, size);
 
 	/*

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-27 16:24 ` James Clark
@ 2021-04-28 11:30   ` James Clark
  2021-04-28 11:52   ` Daniel Kiss
  1 sibling, 0 replies; 35+ messages in thread
From: James Clark @ 2021-04-28 11:30 UTC (permalink / raw)
  To: Daniel Kiss, mathieu.poirier, suzuki.poulose, mike.leach,
	leo.yan, coresight, linux-arm-kernel
  Cc: denik



On 27/04/2021 19:24, James Clark wrote:
> 
> 
> On 21/04/2021 15:04, Daniel Kiss wrote:
>> This series adds a feature to ETR-PERF that sync the ETR buffer to perf
>> periodically. This is really handy when the system wide trace is used
>> because in this case the perf won't sync during the trace. In a per-thread
>> setup the traced program might not go to the kernel frequvently enought
>> to collect trace. Polling helps in both usecases. Can be used with strobing.
>> Tuning polling period is challanging, I'm working on an additional patch
>> that adds some metrics to help tune the polling period.
> 
> Hi Daniel,
> 
> Is the expectation that leaving the polling period at 0 should have no affect
> on the amount of data collected vs not using this patch?
> 
> I've been running the octane 2 benchmark on Chrome on Juno and get these results:
> 
>   No patch:             I consistently get 130MB of trace across dozens of runs.
>   Patch, polling = 0:   Run 1 - 6.1MB
>                         Run 2 - 4.7MB

Sorry, these small files are because there is no AUXTRACE event at all, only built in events. It's not small because
less trace was collected.

>   Patch, polling = 100: 1600MB
> 
> 
> Thanks
> James
> 
>>
>> Daniel Kiss (4):
>>   coresight: tmc-etr: Advance buffer pointer in sync buffer.
>>   coresight: tmc-etr: Track perf handler.
>>   coresight: etm-perf: Export etm_event_cpu_path.
>>   coresight: Add ETR-PERF polling.
>>
>>  .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
>>  drivers/hwtracing/coresight/Makefile          |   2 +-
>>  .../hwtracing/coresight/coresight-etm-perf.c  |  10 +-
>>  .../hwtracing/coresight/coresight-etm-perf.h  |   1 +
>>  .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
>>  .../coresight/coresight-etr-perf-polling.h    |  42 +++
>>  .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
>>  .../hwtracing/coresight/coresight-tmc-etr.c   |  22 +-
>>  drivers/hwtracing/coresight/coresight-tmc.h   |   2 +
>>  9 files changed, 401 insertions(+), 4 deletions(-)
>>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
>>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h
>>

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-27 16:24 ` James Clark
  2021-04-28 11:30   ` James Clark
@ 2021-04-28 11:52   ` Daniel Kiss
  1 sibling, 0 replies; 35+ messages in thread
From: Daniel Kiss @ 2021-04-28 11:52 UTC (permalink / raw)
  To: James Clark
  Cc: mathieu.poirier, Suzuki Poulose, mike.leach, leo.yan, coresight,
	Linux ARM, denik



> On 27 Apr 2021, at 18:24, James Clark <James.Clark@arm.com> wrote:
> 
> 
> 
> On 21/04/2021 15:04, Daniel Kiss wrote:
>> This series adds a feature to ETR-PERF that sync the ETR buffer to perf
>> periodically. This is really handy when the system wide trace is used
>> because in this case the perf won't sync during the trace. In a per-thread
>> setup the traced program might not go to the kernel frequvently enought
>> to collect trace. Polling helps in both usecases. Can be used with strobing.
>> Tuning polling period is challanging, I'm working on an additional patch
>> that adds some metrics to help tune the polling period.
> 
> Hi Daniel,
> 
> Is the expectation that leaving the polling period at 0 should have no affect
> on the amount of data collected vs not using this patch?
> 
> I've been running the octane 2 benchmark on Chrome on Juno and get these results:
> 
>  No patch:             I consistently get 130MB of trace across dozens of runs.
>  Patch, polling = 0:   Run 1 - 6.1MB
>                        Run 2 - 4.7MB
>  Patch, polling = 100: 1600MB

Polling = 0 should not change the behaviour. 
We noticed this too, which is a bug. 
I’m going to drop the "coresight: tmc-etr: Track perf handler.”  patch, because it causes it.
Seems the csdev->refcnt and drvdata->perf_handle are out of sync 

Thanks,
Daniel



> Thanks
> James
> 
>> 
>> Daniel Kiss (4):
>>  coresight: tmc-etr: Advance buffer pointer in sync buffer.
>>  coresight: tmc-etr: Track perf handler.
>>  coresight: etm-perf: Export etm_event_cpu_path.
>>  coresight: Add ETR-PERF polling.
>> 
>> .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
>> drivers/hwtracing/coresight/Makefile          |   2 +-
>> .../hwtracing/coresight/coresight-etm-perf.c  |  10 +-
>> .../hwtracing/coresight/coresight-etm-perf.h  |   1 +
>> .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
>> .../coresight/coresight-etr-perf-polling.h    |  42 +++
>> .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
>> .../hwtracing/coresight/coresight-tmc-etr.c   |  22 +-
>> drivers/hwtracing/coresight/coresight-tmc.h   |   2 +
>> 9 files changed, 401 insertions(+), 4 deletions(-)
>> create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
>> create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h
>> 

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-04-27 16:04         ` Leo Yan
@ 2021-05-05  6:46           ` Denis Nikitin
  2021-05-05 15:29             ` Mathieu Poirier
  0 siblings, 1 reply; 35+ messages in thread
From: Denis Nikitin @ 2021-05-05  6:46 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

On Tue, Apr 27, 2021 at 9:04 AM Leo Yan <leo.yan@linaro.org> wrote:
>
> On Tue, Apr 27, 2021 at 09:47:46AM -0600, Mathieu Poirier wrote:
>
> [...]
>
> > > 2) ETR polling ensures that more trace is collected across the entire
> > > trace session - seeking to reduce inconsistent capture volumes.
> >
> > I am not convinced disabling a sink to collect traces while an
> > event is active is the right way to go.  To me it will add (more) complexity to
> > the coresight subsystem for very little gains, if any.
> >
> > If I remember correctly Leo brought forward the exact same idea about a year ago
> > and after discussion, we all agreed the benefit would not be important enough to
> > offset the drawbacks.
> >
> > As usual I am open to discussion and my opinion is not set in stone.  But as I
> > mentioned I worry the feature will increase complexity in the driver and
> > produce dubious results.  And we also have to factor in usability which, as
> > Al pointed, out will be a problem.
>
> Just want to remind one thing for ETR polling.  From one perspective,
> the ETR polling mode is actually very similar with perf's snapshot
> mode.  E.g. we can use specific interval to send USR2 singal to perf
> tool to captcure CoreSight trace data, thus it also can record the
> trace data continuously.
>
> I can see a benefit from ETR polling mode is it might introduce less
> overhead than perf snapshot mode.  The kernel's mechanism (workqueue
> or kernel thread) will be much efficiency than perf's signal handling
> + SMP call with IPIs.
>
> So it's good to firstly understand if perf snapshot mode can meet the
> requirement or not.

We evaluated the patch on Chrome OS and I can confirm that the quality
of AutoFDO profiles greatly improved with the ETR polling.
Tested with per-thread and system-wide mode.

Without ETR polling the size of the collected ETM data was very
inconsistent on the same workload and could vary by a factor of two.
This, in turn, affects the quality of the AutoFDO profiles generated from ETM.
With ETR polling the data size became pretty stable.
Performance evaluation shows a similar consistency in performance gain
of AutoFDO optimization.
This, I think, supports the idea that data collection right now is sensitive
to the process scheduling and can be improved with ETR polling.

For the system-wide mode particularly we didn't see any other alternatives
to collect data periodically on a long-running workload.
We haven't tested snapshot mode though. The idea sounds interesting.
But small runtime overhead is crucial for the sampling profiler in the field
and if there is a noticeable difference we would incline towards the
ETR polling.

Thanks,
Denis

>
> Thanks,
> Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 4/4] coresight: Add ETR-PERF polling.
  2021-04-21 12:04 ` [PATCH 4/4] coresight: Add ETR-PERF polling Daniel Kiss
  2021-04-26  1:18   ` Leo Yan
@ 2021-05-05  7:21   ` Denis Nikitin
  1 sibling, 0 replies; 35+ messages in thread
From: Denis Nikitin @ 2021-05-05  7:21 UTC (permalink / raw)
  To: Daniel Kiss
  Cc: mathieu.poirier, suzuki.poulose, mike.leach, leo.yan, coresight,
	linux-arm-kernel, Branislav Rankov

On Wed, Apr 21, 2021 at 02:04:13PM +0200, Daniel Kiss wrote:
> ETR might fill up the buffer sooner than an event makes perf to trigger
> the synchronisation especially in system wide trace. Polling runs
> periodically to sync the ETR buffer. Period is configurable via sysfs,
> disabled by default.
> 
> Signed-off-by: Daniel Kiss <daniel.kiss@arm.com>
> Signed-off-by: Branislav Rankov <Branislav.Rankov@arm.com>

Tested-by: Denis Nikitin <denik@chromium.org>

Thanks,
Denis

> ---
>  .../testing/sysfs-bus-coresight-devices-tmc   |   8 +
>  drivers/hwtracing/coresight/Makefile          |   2 +-
>  .../hwtracing/coresight/coresight-etm-perf.c  |   8 +
>  .../coresight/coresight-etr-perf-polling.c    | 316 ++++++++++++++++++
>  .../coresight/coresight-etr-perf-polling.h    |  42 +++
>  .../hwtracing/coresight/coresight-tmc-core.c  |   2 +
>  .../hwtracing/coresight/coresight-tmc-etr.c   |   9 +
>  7 files changed, 386 insertions(+), 1 deletion(-)
>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.c
>  create mode 100644 drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> 
> diff --git a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
> index 6aa527296c710..4ca7af22a3686 100644
> --- a/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
> +++ b/Documentation/ABI/testing/sysfs-bus-coresight-devices-tmc
> @@ -91,3 +91,11 @@ Contact:	Mathieu Poirier <mathieu.poirier@linaro.org>
>  Description:	(RW) Size of the trace buffer for TMC-ETR when used in SYSFS
>  		mode. Writable only for TMC-ETR configurations. The value
>  		should be aligned to the kernel pagesize.
> +
> +What:		/sys/bus/coresight/devices/<memory_map>.tmc/polling/period
> +Date:		April 2021
> +KernelVersion:	5.13
> +Contact:	Daniel Kiss <daniel.kiss@arm.com>
> +Description:	(RW) Time in milliseconds when the TMC-ETR is synced.
> +		Default value is 0, means the feature is disabled.
> +		Writable only for TMC-ETR configurations.
> diff --git a/drivers/hwtracing/coresight/Makefile b/drivers/hwtracing/coresight/Makefile
> index d60816509755c..4df90b71d98cd 100644
> --- a/drivers/hwtracing/coresight/Makefile
> +++ b/drivers/hwtracing/coresight/Makefile
> @@ -4,7 +4,7 @@
>  #
>  obj-$(CONFIG_CORESIGHT) += coresight.o
>  coresight-y := coresight-core.o  coresight-etm-perf.o coresight-platform.o \
> -		coresight-sysfs.o
> +		coresight-sysfs.o  coresight-etr-perf-polling.o
>  obj-$(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) += coresight-tmc.o
>  coresight-tmc-y := coresight-tmc-core.o coresight-tmc-etf.o \
>  		      coresight-tmc-etr.o
> diff --git a/drivers/hwtracing/coresight/coresight-etm-perf.c b/drivers/hwtracing/coresight/coresight-etm-perf.c
> index 78a55fc2bcab5..910a99944eea8 100644
> --- a/drivers/hwtracing/coresight/coresight-etm-perf.c
> +++ b/drivers/hwtracing/coresight/coresight-etm-perf.c
> @@ -19,6 +19,7 @@
>  #include <linux/workqueue.h>
>  
>  #include "coresight-etm-perf.h"
> +#include "coresight-etr-perf-polling.h"
>  #include "coresight-priv.h"
>  
>  static struct pmu etm_pmu;
> @@ -438,6 +439,8 @@ static void etm_event_start(struct perf_event *event, int flags)
>  	/* Tell the perf core the event is alive */
>  	event->hw.state = 0;
>  
> +	etr_perf_polling_event_start(event, event_data, handle);
> +
>  	/* Finally enable the tracer */
>  	if (source_ops(csdev)->enable(csdev, event, CS_MODE_PERF))
>  		goto fail_disable_path;
> @@ -497,6 +500,8 @@ static void etm_event_stop(struct perf_event *event, int mode)
>  	if (!sink)
>  		return;
>  
> +	etr_perf_polling_event_stop(event, event_data);
> +
>  	/* stop tracer */
>  	source_ops(csdev)->disable(csdev, event);
>  
> @@ -741,6 +746,8 @@ int __init etm_perf_init(void)
>  	etm_pmu.addr_filters_validate	= etm_addr_filters_validate;
>  	etm_pmu.nr_addr_filters		= ETM_ADDR_CMP_MAX;
>  
> +	etr_perf_polling_init();
> +
>  	ret = perf_pmu_register(&etm_pmu, CORESIGHT_ETM_PMU_NAME, -1);
>  	if (ret == 0)
>  		etm_perf_up = true;
> @@ -750,5 +757,6 @@ int __init etm_perf_init(void)
>  
>  void __exit etm_perf_exit(void)
>  {
> +	etr_perf_polling_exit();
>  	perf_pmu_unregister(&etm_pmu);
>  }
> diff --git a/drivers/hwtracing/coresight/coresight-etr-perf-polling.c b/drivers/hwtracing/coresight/coresight-etr-perf-polling.c
> new file mode 100644
> index 0000000000000..aa0352908873a
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-etr-perf-polling.c
> @@ -0,0 +1,316 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright(C) 2021 Arm Limited. All rights reserved.
> + * Author: Daniel Kiss <daniel.kiss@arm.com>
> + */
> +
> +#include <linux/coresight.h>
> +#include <linux/coresight-pmu.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/init.h>
> +#include <linux/list.h>
> +#include <linux/percpu-defs.h>
> +#include <linux/perf_event.h>
> +#include <linux/slab.h>
> +#include <linux/stringhash.h>
> +#include <linux/types.h>
> +#include <linux/workqueue.h>
> +
> +#include "coresight-etr-perf-polling.h"
> +#include "coresight-priv.h"
> +#include "coresight-tmc.h"
> +
> +#if defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) || \
> +    defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC_MODULE)
> +
> +struct polling_event_list {
> +	struct perf_event *perf_event;
> +	struct etm_event_data *etm_event_data;
> +	struct perf_output_handle *ctx_handle;
> +	void (*tmc_etr_reset_hw)(struct tmc_drvdata *);
> +	struct list_head list;
> +};
> +
> +struct polling {
> +	int cpu;
> +	struct list_head polled_events;
> +	struct delayed_work delayed_work;
> +};
> +
> +static atomic_t period;
> +static spinlock_t spinlock_re;
> +static struct list_head registered_events;
> +
> +static DEFINE_PER_CPU(struct polling, polling);
> +
> +static ssize_t period_show(struct device *dev, struct device_attribute *attr,
> +			   char *buf)
> +{
> +	int temp;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
> +		return -EPERM;
> +
> +	temp = atomic_read(&period);
> +	return sprintf(buf, "%i\n", temp);
> +}
> +
> +static ssize_t period_store(struct device *dev, struct device_attribute *attr,
> +			    const char *buf, size_t count)
> +{
> +	int temp = 0;
> +	struct tmc_drvdata *drvdata = dev_get_drvdata(dev->parent);
> +
> +	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
> +		return -EPERM;
> +
> +	if ((1 == sscanf(buf, "%i", &temp)) && (temp >= 0))
> +		atomic_set(&period, temp);
> +	return count;
> +}
> +
> +static DEVICE_ATTR_RW(period);
> +
> +static struct attribute *coresight_tmc_polling_attrs[] = {
> +	&dev_attr_period.attr,
> +	NULL,
> +};
> +const struct attribute_group coresight_tmc_polling_group = {
> +	.attrs = coresight_tmc_polling_attrs,
> +	.name = "polling",
> +};
> +EXPORT_SYMBOL_GPL(coresight_tmc_polling_group);
> +
> +static inline void polling_sched_worker(struct polling *p)
> +{
> +	int tickrate = atomic_read(&period);
> +	if (!list_empty(&p->polled_events) && (tickrate > 0))
> +		schedule_delayed_work_on(p->cpu, &p->delayed_work,
> +					 msecs_to_jiffies(tickrate));
> +}
> +
> +static inline bool is_etr_related(struct etm_event_data *etm_event_data, int cpu)
> +{
> +	struct list_head *path;
> +	struct coresight_device *sink;
> +	struct tmc_drvdata *drvdata;
> +	path = etm_event_cpu_path(etm_event_data, cpu);
> +	if (WARN_ON(!path))
> +		return false;
> +	sink = coresight_get_sink(path);
> +	if (WARN_ON(!sink))
> +		return false;
> +	drvdata = dev_get_drvdata(sink->dev.parent);
> +	if (drvdata->config_type != TMC_CONFIG_TYPE_ETR)
> +		return false;
> +	return true;
> +}
> +
> +/*
> + * Adds the event to the polled events list.
> + */
> +void etr_perf_polling_event_start(struct perf_event *event,
> +				  struct etm_event_data *etm_event_data,
> +				  struct perf_output_handle *ctx_handle)
> +{
> +	int cpu = smp_processor_id();
> +	struct polling *p = per_cpu_ptr(&polling, cpu);
> +	struct polling_event_list *element;
> +	struct list_head *i, *tmp;
> +
> +	if (!is_etr_related(etm_event_data, cpu))
> +		return;
> +
> +	spin_lock(&spinlock_re);
> +	list_for_each_safe (i, tmp, &registered_events) {
> +		element = list_entry(i, struct polling_event_list, list);
> +		if (element->ctx_handle == ctx_handle) {
> +			element->perf_event = event;
> +			element->etm_event_data = etm_event_data;
> +			list_del(&element->list);
> +			spin_unlock(&spinlock_re);
> +			list_add(&element->list, &p->polled_events);
> +			polling_sched_worker(p);
> +			return;
> +		}
> +	}
> +	spin_unlock(&spinlock_re);
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_event_start);
> +
> +/*
> + * Removes the event from the to be polled events list.
> + */
> +void etr_perf_polling_event_stop(struct perf_event *event,
> +				 struct etm_event_data *etm_event_data)
> +{
> +	int cpu = smp_processor_id();
> +	struct list_head *i, *tmp;
> +	struct polling *p = per_cpu_ptr(&polling, cpu);
> +
> +	if (!is_etr_related(etm_event_data, cpu))
> +		return;
> +
> +	list_for_each_safe (i, tmp, &p->polled_events) {
> +		struct polling_event_list *element =
> +			list_entry(i, struct polling_event_list, list);
> +		if (element->perf_event == event) {
> +			list_del(&element->list);
> +			element->perf_event = NULL;
> +			element->etm_event_data = NULL;
> +			spin_lock(&spinlock_re);
> +			list_add(&element->list, &registered_events);
> +			spin_unlock(&spinlock_re);
> +			if (list_empty(&p->polled_events)) {
> +				cancel_delayed_work(&p->delayed_work);
> +			}
> +			return;
> +		}
> +	}
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_event_stop);
> +
> +/*
> + * The polling worker is a workqueue job which is periodically
> + * woken up to update the perf aux buffer from the etr shrink.
> + */
> +static void etr_perf_polling_worker(struct work_struct *work)
> +{
> +	unsigned long flags;
> +	int cpu = smp_processor_id();
> +	struct polling *p = per_cpu_ptr(&polling, cpu);
> +	struct list_head *i, *tmp;
> +
> +	if (!atomic_read(&period))
> +		return;
> +
> +	/*
> +	 * Scheduling would do the same from the perf hooks,
> +	 * this should be done in one go.
> +	 */
> +	local_irq_save(flags);
> +	preempt_disable();
> +	/* Perf requires rcu lock. */
> +	rcu_read_lock();
> +
> +	polling_sched_worker(p);
> +
> +	list_for_each_safe (i, tmp, &p->polled_events) {
> +		struct list_head *path;
> +		struct coresight_device *sink;
> +		struct polling_event_list *element =
> +			list_entry(i, struct polling_event_list, list);
> +
> +		path = etm_event_cpu_path(element->etm_event_data, cpu);
> +		if (WARN_ON(!path))
> +			continue;
> +		sink = coresight_get_sink(path);
> +		if (WARN_ON(!sink))
> +			continue;
> +		if (sink_ops(sink)->update_buffer) {
> +			int size, refcnt;
> +			struct tmc_drvdata *drvdata = dev_get_drvdata(sink->dev.parent);
> +
> +			/*
> +			 * Act as now we are the only users of the sink. Due to the locks
> +			 * we are safe.
> +			 */
> +			refcnt = atomic_xchg(sink->refcnt, 1);
> +			size = sink_ops(sink)->update_buffer(
> +				sink, element->ctx_handle,
> +				element->etm_event_data->snk_config);
> +			refcnt = atomic_xchg(sink->refcnt, refcnt);
> +			/*
> +			 * Restart the trace.
> +			 */
> +			if (element->tmc_etr_reset_hw)
> +				element->tmc_etr_reset_hw(drvdata);
> +
> +			WARN_ON(size < 0);
> +			if (size > 0) {
> +				struct etm_event_data *new_event_data;
> +
> +				perf_aux_output_end(element->ctx_handle, size);
> +				new_event_data = perf_aux_output_begin(
> +					element->ctx_handle,
> +					element->perf_event);
> +				if (WARN_ON(new_event_data == NULL))
> +					continue;
> +				element->etm_event_data = new_event_data;
> +				WARN_ON(new_event_data->snk_config !=
> +					element->etm_event_data->snk_config);
> +			}
> +		}
> +	}
> +
> +	rcu_read_unlock();
> +	preempt_enable();
> +	local_irq_restore(flags);
> +}
> +
> +void etr_perf_polling_handle_register(struct perf_output_handle *handle,
> +				      void (*tmc_etr_reset_hw)(struct tmc_drvdata *drvdata))
> +{
> +	struct polling_event_list *element;
> +
> +	element = kmalloc(sizeof(*element), GFP_KERNEL);
> +	if (WARN_ON(!element))
> +		return;
> +	memset(element, 0, sizeof(*element));
> +	element->ctx_handle = handle;
> +	element->tmc_etr_reset_hw = tmc_etr_reset_hw;
> +	spin_lock(&spinlock_re);
> +	list_add(&element->list, &registered_events);
> +	spin_unlock(&spinlock_re);
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_handle_register);
> +
> +void etr_perf_polling_handle_deregister(struct perf_output_handle *handle)
> +{
> +	struct list_head *i, *tmp;
> +
> +	spin_lock(&spinlock_re);
> +	list_for_each_safe (i, tmp, &registered_events) {
> +		struct polling_event_list *element =
> +			list_entry(i, struct polling_event_list, list);
> +		if (element->ctx_handle == handle) {
> +			list_del(&element->list);
> +			spin_unlock(&spinlock_re);
> +			kfree(element);
> +			return;
> +		}
> +	}
> +	spin_unlock(&spinlock_re);
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_handle_deregister);
> +
> +void etr_perf_polling_init(void)
> +{
> +	int cpu;
> +	spin_lock_init(&spinlock_re);
> +	INIT_LIST_HEAD(&registered_events);
> +	atomic_set(&period, 0);
> +	for_each_possible_cpu (cpu) {
> +		struct polling *p = per_cpu_ptr(&polling, cpu);
> +		p->cpu = cpu;
> +		INIT_LIST_HEAD(&p->polled_events);
> +		INIT_DELAYED_WORK(&p->delayed_work, etr_perf_polling_worker);
> +	}
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_init);
> +
> +void etr_perf_polling_exit(void)
> +{
> +	int cpu;
> +	for_each_possible_cpu (cpu) {
> +		struct polling *p = per_cpu_ptr(&polling, cpu);
> +		cancel_delayed_work_sync(&p->delayed_work);
> +		WARN_ON(!list_empty(&p->polled_events));
> +	}
> +	WARN_ON(!list_empty(&registered_events));
> +}
> +EXPORT_SYMBOL_GPL(etr_perf_polling_exit);
> +
> +#endif
> diff --git a/drivers/hwtracing/coresight/coresight-etr-perf-polling.h b/drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> new file mode 100644
> index 0000000000000..5917e1fa408bb
> --- /dev/null
> +++ b/drivers/hwtracing/coresight/coresight-etr-perf-polling.h
> @@ -0,0 +1,42 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +/*
> + * Copyright(C) 2021 Arm Limited. All rights reserved.
> + * Author: Daniel Kiss <daniel.kiss@arm.com>
> + */
> +
> +#ifndef _CORESIGHT_ETM_PERF_POLLING_H
> +#define _CORESIGHT_ETM_PERF_POLLING_H
> +
> +#include <linux/coresight.h>
> +#include <linux/perf_event.h>
> +#include "coresight-etm-perf.h"
> +#include "coresight-tmc.h"
> +
> +#if defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC) || \
> +    defined(CONFIG_CORESIGHT_LINK_AND_SINK_TMC_MODULE)
> +
> +void etr_perf_polling_init(void);
> +void etr_perf_polling_exit(void);
> +void etr_perf_polling_handle_register(struct perf_output_handle *handle,
> +				      void (*tmc_etr_reset_hw)(struct tmc_drvdata *drvdata));
> +void etr_perf_polling_handle_deregister(struct perf_output_handle *handle);
> +void etr_perf_polling_event_start(struct perf_event *event,
> +				  struct etm_event_data *etm_event_data,
> +				  struct perf_output_handle *ctx_handle);
> +void etr_perf_polling_event_stop(struct perf_event *event,
> +				 struct etm_event_data *etm_event_data);
> +
> +extern const struct attribute_group coresight_tmc_polling_group;
> +#define CORESIGHT_TMP_POLLING_GROUP &coresight_tmc_polling_group,
> +
> +#else /* !CONFIG_CORESIGHT_LINK_AND_SINK_TMC */
> +#define etr_perf_polling_init()
> +#define etr_perf_polling_exit()
> +#define etr_perf_polling_handle_register(...)
> +#define etr_perf_polling_handle_deregister(...)
> +#define etr_perf_polling_event_start(...)
> +#define etr_perf_polling_event_stop(...)
> +#define CORESIGHT_TMP_POLLING_GROUP
> +#endif
> +
> +#endif
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-core.c b/drivers/hwtracing/coresight/coresight-tmc-core.c
> index 74c6323d4d6ab..51e705ef3ffa3 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-core.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-core.c
> @@ -26,6 +26,7 @@
>  
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
> +#include "coresight-etr-perf-polling.h"
>  
>  DEFINE_CORESIGHT_DEVLIST(etb_devs, "tmc_etb");
>  DEFINE_CORESIGHT_DEVLIST(etf_devs, "tmc_etf");
> @@ -365,6 +366,7 @@ static const struct attribute_group coresight_tmc_mgmt_group = {
>  static const struct attribute_group *coresight_tmc_groups[] = {
>  	&coresight_tmc_group,
>  	&coresight_tmc_mgmt_group,
> +	CORESIGHT_TMP_POLLING_GROUP
>  	NULL,
>  };
>  
> diff --git a/drivers/hwtracing/coresight/coresight-tmc-etr.c b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> index bf9f6311d8663..021b594e38e71 100644
> --- a/drivers/hwtracing/coresight/coresight-tmc-etr.c
> +++ b/drivers/hwtracing/coresight/coresight-tmc-etr.c
> @@ -16,6 +16,7 @@
>  #include <linux/vmalloc.h>
>  #include "coresight-catu.h"
>  #include "coresight-etm-perf.h"
> +#include "coresight-etr-perf-polling.h"
>  #include "coresight-priv.h"
>  #include "coresight-tmc.h"
>  
> @@ -1139,6 +1140,12 @@ void tmc_etr_disable_hw(struct tmc_drvdata *drvdata)
>  	drvdata->etr_buf = NULL;
>  }
>  
> +static void tmc_etr_reset_hw(struct tmc_drvdata *drvdata)
> +{
> +	__tmc_etr_disable_hw(drvdata);
> +	__tmc_etr_enable_hw(drvdata);
> +}
> +
>  static int tmc_enable_etr_sink_sysfs(struct coresight_device *csdev)
>  {
>  	int ret = 0;
> @@ -1630,6 +1637,7 @@ static int tmc_enable_etr_sink_perf(struct coresight_device *csdev, void *data)
>  		drvdata->mode = CS_MODE_PERF;
>  		drvdata->perf_buf = etr_perf->etr_buf;
>  		drvdata->perf_handle = handle;
> +		etr_perf_polling_handle_register(handle, tmc_etr_reset_hw);
>  		atomic_inc(csdev->refcnt);
>  	}
>  
> @@ -1677,6 +1685,7 @@ static int tmc_disable_etr_sink(struct coresight_device *csdev)
>  	drvdata->mode = CS_MODE_DISABLED;
>  	/* Reset perf specific data */
>  	drvdata->perf_buf = NULL;
> +	etr_perf_polling_handle_deregister(drvdata->perf_handle);
>  	drvdata->perf_handle = NULL;
>  
>  	spin_unlock_irqrestore(&drvdata->spinlock, flags);
> -- 
> 2.25.1
> 
> _______________________________________________
> CoreSight mailing list
> CoreSight@lists.linaro.org
> https://lists.linaro.org/mailman/listinfo/coresight

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-05  6:46           ` Denis Nikitin
@ 2021-05-05 15:29             ` Mathieu Poirier
  2021-05-14  9:02               ` Denis Nikitin
  0 siblings, 1 reply; 35+ messages in thread
From: Mathieu Poirier @ 2021-05-05 15:29 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Leo Yan, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

On Tue, May 04, 2021 at 11:46:20PM -0700, Denis Nikitin wrote:
> On Tue, Apr 27, 2021 at 9:04 AM Leo Yan <leo.yan@linaro.org> wrote:
> >
> > On Tue, Apr 27, 2021 at 09:47:46AM -0600, Mathieu Poirier wrote:
> >
> > [...]
> >
> > > > 2) ETR polling ensures that more trace is collected across the entire
> > > > trace session - seeking to reduce inconsistent capture volumes.
> > >
> > > I am not convinced disabling a sink to collect traces while an
> > > event is active is the right way to go.  To me it will add (more) complexity to
> > > the coresight subsystem for very little gains, if any.
> > >
> > > If I remember correctly Leo brought forward the exact same idea about a year ago
> > > and after discussion, we all agreed the benefit would not be important enough to
> > > offset the drawbacks.
> > >
> > > As usual I am open to discussion and my opinion is not set in stone.  But as I
> > > mentioned I worry the feature will increase complexity in the driver and
> > > produce dubious results.  And we also have to factor in usability which, as
> > > Al pointed, out will be a problem.
> >
> > Just want to remind one thing for ETR polling.  From one perspective,
> > the ETR polling mode is actually very similar with perf's snapshot
> > mode.  E.g. we can use specific interval to send USR2 singal to perf
> > tool to captcure CoreSight trace data, thus it also can record the
> > trace data continuously.
> >
> > I can see a benefit from ETR polling mode is it might introduce less
> > overhead than perf snapshot mode.  The kernel's mechanism (workqueue
> > or kernel thread) will be much efficiency than perf's signal handling
> > + SMP call with IPIs.
> >
> > So it's good to firstly understand if perf snapshot mode can meet the
> > requirement or not.
> 
> We evaluated the patch on Chrome OS and I can confirm that the quality
> of AutoFDO profiles greatly improved with the ETR polling.
> Tested with per-thread and system-wide mode.
> 
> Without ETR polling the size of the collected ETM data was very
> inconsistent on the same workload and could vary by a factor of two.
> This, in turn, affects the quality of the AutoFDO profiles generated from ETM.
> With ETR polling the data size became pretty stable.
> Performance evaluation shows a similar consistency in performance gain
> of AutoFDO optimization.
> This, I think, supports the idea that data collection right now is sensitive
> to the process scheduling and can be improved with ETR polling.
> 
> For the system-wide mode particularly we didn't see any other alternatives
> to collect data periodically on a long-running workload.
> We haven't tested snapshot mode though. The idea sounds interesting.
> But small runtime overhead is crucial for the sampling profiler in the field
> and if there is a noticeable difference we would incline towards the
> ETR polling.

Please see if Leo's approach[1], or any kind of extension to the current
snapshot feature, would be a viable solution.  Reusing or extending code that is
already there is always a better option.

Thanks,
Mathieu

[1]. https://lists.linaro.org/pipermail/coresight/2021-April/006254.html

> 
> Thanks,
> Denis
> 
> >
> > Thanks,
> > Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-05 15:29             ` Mathieu Poirier
@ 2021-05-14  9:02               ` Denis Nikitin
  2021-05-14 16:16                 ` Mike Leach
                                   ` (2 more replies)
  0 siblings, 3 replies; 35+ messages in thread
From: Denis Nikitin @ 2021-05-14  9:02 UTC (permalink / raw)
  To: Mathieu Poirier
  Cc: Leo Yan, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

On Wed, May 5, 2021 at 8:29 AM Mathieu Poirier
<mathieu.poirier@linaro.org> wrote:
>
> On Tue, May 04, 2021 at 11:46:20PM -0700, Denis Nikitin wrote:
> > On Tue, Apr 27, 2021 at 9:04 AM Leo Yan <leo.yan@linaro.org> wrote:
> > >
> > > On Tue, Apr 27, 2021 at 09:47:46AM -0600, Mathieu Poirier wrote:
> > >
> > > [...]
> > >
> > > > > 2) ETR polling ensures that more trace is collected across the entire
> > > > > trace session - seeking to reduce inconsistent capture volumes.
> > > >
> > > > I am not convinced disabling a sink to collect traces while an
> > > > event is active is the right way to go.  To me it will add (more) complexity to
> > > > the coresight subsystem for very little gains, if any.
> > > >
> > > > If I remember correctly Leo brought forward the exact same idea about a year ago
> > > > and after discussion, we all agreed the benefit would not be important enough to
> > > > offset the drawbacks.
> > > >
> > > > As usual I am open to discussion and my opinion is not set in stone.  But as I
> > > > mentioned I worry the feature will increase complexity in the driver and
> > > > produce dubious results.  And we also have to factor in usability which, as
> > > > Al pointed, out will be a problem.
> > >
> > > Just want to remind one thing for ETR polling.  From one perspective,
> > > the ETR polling mode is actually very similar with perf's snapshot
> > > mode.  E.g. we can use specific interval to send USR2 singal to perf
> > > tool to captcure CoreSight trace data, thus it also can record the
> > > trace data continuously.
> > >
> > > I can see a benefit from ETR polling mode is it might introduce less
> > > overhead than perf snapshot mode.  The kernel's mechanism (workqueue
> > > or kernel thread) will be much efficiency than perf's signal handling
> > > + SMP call with IPIs.
> > >
> > > So it's good to firstly understand if perf snapshot mode can meet the
> > > requirement or not.
> >
> > We evaluated the patch on Chrome OS and I can confirm that the quality
> > of AutoFDO profiles greatly improved with the ETR polling.
> > Tested with per-thread and system-wide mode.
> >
> > Without ETR polling the size of the collected ETM data was very
> > inconsistent on the same workload and could vary by a factor of two.
> > This, in turn, affects the quality of the AutoFDO profiles generated from ETM.
> > With ETR polling the data size became pretty stable.
> > Performance evaluation shows a similar consistency in performance gain
> > of AutoFDO optimization.
> > This, I think, supports the idea that data collection right now is sensitive
> > to the process scheduling and can be improved with ETR polling.
> >
> > For the system-wide mode particularly we didn't see any other alternatives
> > to collect data periodically on a long-running workload.
> > We haven't tested snapshot mode though. The idea sounds interesting.
> > But small runtime overhead is crucial for the sampling profiler in the field
> > and if there is a noticeable difference we would incline towards the
> > ETR polling.
>
> Please see if Leo's approach[1], or any kind of extension to the current
> snapshot feature, would be a viable solution.  Reusing or extending code that is
> already there is always a better option.
>
> Thanks,
> Mathieu
>
> [1]. https://lists.linaro.org/pipermail/coresight/2021-April/006254.html
>

Hi Mattieu and Leo,

I did some evaluation of the snapshot mode.

Performance overhead is indeed higher than with ETR polling patch.
Here are some numbers for comparison (measured on browser
Speedometer2 benchmark):
Runtime overhead of ETM tracing with ETR poll period 100ms is less than
0.5%. Snapshot mode gives 2.1%.
With 10ms period I see 4.6% with ETR polling and 22% in snapshot mode.

We could probably utilize the ETM strobing feature and reduce frequency
of data collection but I see a problem when I'm using both.
Within a minute of profiling the ETM generates a reasonable profile size
(with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
But then the size grows unproportionally.
With a 4 minute run I got a 6.3GB profile.
I don't see such a problem with the ETR polling patch.

Leo, could you please take a look at this problem?

Thanks,
Denis

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-14  9:02               ` Denis Nikitin
@ 2021-05-14 16:16                 ` Mike Leach
  2021-05-18 14:00                 ` Leo Yan
  2021-05-23  8:45                 ` Leo Yan
  2 siblings, 0 replies; 35+ messages in thread
From: Mike Leach @ 2021-05-14 16:16 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Mathieu Poirier, Leo Yan, coresight, linux-arm-kernel, Daniel Kiss

Hi,

On Fri, 14 May 2021 at 10:02, Denis Nikitin <denik@google.com> wrote:
>
> On Wed, May 5, 2021 at 8:29 AM Mathieu Poirier
> <mathieu.poirier@linaro.org> wrote:
> >
> > On Tue, May 04, 2021 at 11:46:20PM -0700, Denis Nikitin wrote:
> > > On Tue, Apr 27, 2021 at 9:04 AM Leo Yan <leo.yan@linaro.org> wrote:
> > > >
> > > > On Tue, Apr 27, 2021 at 09:47:46AM -0600, Mathieu Poirier wrote:
> > > >
> > > > [...]
> > > >
> > > > > > 2) ETR polling ensures that more trace is collected across the entire
> > > > > > trace session - seeking to reduce inconsistent capture volumes.
> > > > >
> > > > > I am not convinced disabling a sink to collect traces while an
> > > > > event is active is the right way to go.  To me it will add (more) complexity to
> > > > > the coresight subsystem for very little gains, if any.
> > > > >
> > > > > If I remember correctly Leo brought forward the exact same idea about a year ago
> > > > > and after discussion, we all agreed the benefit would not be important enough to
> > > > > offset the drawbacks.
> > > > >
> > > > > As usual I am open to discussion and my opinion is not set in stone.  But as I
> > > > > mentioned I worry the feature will increase complexity in the driver and
> > > > > produce dubious results.  And we also have to factor in usability which, as
> > > > > Al pointed, out will be a problem.
> > > >
> > > > Just want to remind one thing for ETR polling.  From one perspective,
> > > > the ETR polling mode is actually very similar with perf's snapshot
> > > > mode.  E.g. we can use specific interval to send USR2 singal to perf
> > > > tool to captcure CoreSight trace data, thus it also can record the
> > > > trace data continuously.
> > > >
> > > > I can see a benefit from ETR polling mode is it might introduce less
> > > > overhead than perf snapshot mode.  The kernel's mechanism (workqueue
> > > > or kernel thread) will be much efficiency than perf's signal handling
> > > > + SMP call with IPIs.
> > > >
> > > > So it's good to firstly understand if perf snapshot mode can meet the
> > > > requirement or not.
> > >
> > > We evaluated the patch on Chrome OS and I can confirm that the quality
> > > of AutoFDO profiles greatly improved with the ETR polling.
> > > Tested with per-thread and system-wide mode.
> > >
> > > Without ETR polling the size of the collected ETM data was very
> > > inconsistent on the same workload and could vary by a factor of two.
> > > This, in turn, affects the quality of the AutoFDO profiles generated from ETM.
> > > With ETR polling the data size became pretty stable.
> > > Performance evaluation shows a similar consistency in performance gain
> > > of AutoFDO optimization.
> > > This, I think, supports the idea that data collection right now is sensitive
> > > to the process scheduling and can be improved with ETR polling.
> > >
> > > For the system-wide mode particularly we didn't see any other alternatives
> > > to collect data periodically on a long-running workload.
> > > We haven't tested snapshot mode though. The idea sounds interesting.
> > > But small runtime overhead is crucial for the sampling profiler in the field
> > > and if there is a noticeable difference we would incline towards the
> > > ETR polling.
> >
> > Please see if Leo's approach[1], or any kind of extension to the current
> > snapshot feature, would be a viable solution.  Reusing or extending code that is
> > already there is always a better option.
> >
> > Thanks,
> > Mathieu
> >
> > [1]. https://lists.linaro.org/pipermail/coresight/2021-April/006254.html
> >
>
> Hi Mattieu and Leo,
>
> I did some evaluation of the snapshot mode.
>
> Performance overhead is indeed higher than with ETR polling patch.
> Here are some numbers for comparison (measured on browser
> Speedometer2 benchmark):
> Runtime overhead of ETM tracing with ETR poll period 100ms is less than
> 0.5%. Snapshot mode gives 2.1%.
> With 10ms period I see 4.6% with ETR polling and 22% in snapshot mode.
>
> We could probably utilize the ETM strobing feature and reduce frequency
> of data collection but I see a problem when I'm using both.
> Within a minute of profiling the ETM generates a reasonable profile size
> (with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
> But then the size grows unproportionally.
> With a 4 minute run I got a 6.3GB profile.
> I don't see such a problem with the ETR polling patch.
>
> Leo, could you please take a look at this problem?
>
> Thanks,
> Denis

I do think this patchset presents a valuable feature to add to the
CoreSight support.

This closes a gap on systems where we do not have an interrupt to
trigger on a full sink, and is of evident value to users, giving them
predictable trace sessions, with an even spread of trace across the
session, as has been requested a number of times.
I note the concern regarding stopping trace while the event is active
- but is this not  precisely what we do with ETE/TRBE when it hits an
interrupt? The hardware is stopped and the IRQ updates the perf aux
buffer, and allows the perf core to decide what to do next. In this
patchset we are simply replacing the IRQ with a timer.

Though ETR does not have a direct interrupt - it does have a full
flag, which can be - in a system dependent way - wired through the CTI
and into a PE interrupt. This would give the equivalent operation of
TRBE. Now I do not suggest pursuing this course - as this is per
system architecture dependent - the timer is a better fit for a
generic solution.

The main issue I see with this solution - which is shared with
ETE/TRBE, is that for accurate decode we must know the boundaries of
each captured block in the AUXTRACE buffer in order to reset the
decoder. As has been discussed elsewhere, this work is in hand.

While there are clearly technical implementation details to be fixed
in this set, which others have commented on in the subsequent patches.
I believe in principle it is sound.

Regards

Mike


-- 
Mike Leach
Principal Engineer, ARM Ltd.
Manchester Design Centre. UK

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-14  9:02               ` Denis Nikitin
  2021-05-14 16:16                 ` Mike Leach
@ 2021-05-18 14:00                 ` Leo Yan
  2021-05-18 14:14                   ` Leo Yan
                                     ` (2 more replies)
  2021-05-23  8:45                 ` Leo Yan
  2 siblings, 3 replies; 35+ messages in thread
From: Leo Yan @ 2021-05-18 14:00 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

Hi Denis,

On Fri, May 14, 2021 at 02:02:25AM -0700, Denis Nikitin wrote:

[...]

> Hi Mattieu and Leo,
> 
> I did some evaluation of the snapshot mode.

Thanks a lot for the evaluation and share back the result.

> Performance overhead is indeed higher than with ETR polling patch.
> Here are some numbers for comparison (measured on browser
> Speedometer2 benchmark):
> Runtime overhead of ETM tracing with ETR poll period 100ms is less than
> 0.5%. Snapshot mode gives 2.1%.
> With 10ms period I see 4.6% with ETR polling and 22% in snapshot mode.

It's not expected that the snapshot mode causes so big overload.
In my head, two factors might cause the overload:

- The perf interaction between the user space and kernel space;
- The data copying from the ETR's buffer to the AUX ring buffer.

Check one thing: what's the buffer size for ETR polling mode and for
snapshot mode in your experiments?

If I remember correctly, by default the snapshot mode uses 4MB for ETR
buffer, if copying 4MB per 10ms, then it's likely to cause big
overload.  So at the first glance, the overhead difference might be
caused by the by the different buffer size between ETR poll mode and
snapshot mode.

> We could probably utilize the ETM strobing feature and reduce frequency
> of data collection but I see a problem when I'm using both.
> Within a minute of profiling the ETM generates a reasonable profile size
> (with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
> But then the size grows unproportionally.
> With a 4 minute run I got a 6.3GB profile.

Just check, as Mathieu has suggested, have you applied the patch [1]
on your local code base for fixing the data copying for snapshot mode?

After applied this patch, one possibility for unproportional issue is
perf tool itself introduces many activities in snapshot mode (especially
for 10ms period), so the perf tool contributes much extra trace data.

Another potential issue is: after setting strobing mode, the snapshot
mode will disable the complete paths for tracers and ETR, so if the
strobing configuration is lost after re-enable tracers, then it might
cause the huge trace data in the later phase.  For this case, we
definitely should fix it.

> I don't see such a problem with the ETR polling patch.
> 
> Leo, could you please take a look at this problem?

Sure.  For easier reproducing the issue, could you share me the
detailed commands (and source code)?

P.s. I saw Mike suggests to continue the ETR polling development, this
is not conflict with snapshot mode.  At my side, I will investigate the
snapshot mode, but don't want to disturb the process for ETR polling
mode, so when the ETR polling patch series is get ready, please go
ahead for upstreaming the patch series.

Thanks,
Leo

[1]. https://lists.linaro.org/pipermail/coresight/2021-April/006254.html

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-18 14:00                 ` Leo Yan
@ 2021-05-18 14:14                   ` Leo Yan
  2021-05-18 15:41                   ` Mathieu Poirier
  2021-05-26  6:47                   ` Denis Nikitin
  2 siblings, 0 replies; 35+ messages in thread
From: Leo Yan @ 2021-05-18 14:14 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

On Tue, May 18, 2021 at 10:00:40PM +0800, Leo Yan wrote:

[...]

> > We could probably utilize the ETM strobing feature and reduce frequency
> > of data collection but I see a problem when I'm using both.
> > Within a minute of profiling the ETM generates a reasonable profile size
> > (with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
> > But then the size grows unproportionally.
> > With a 4 minute run I got a 6.3GB profile.
> >
> > I don't see such a problem with the ETR polling patch.
> > 
> > Leo, could you please take a look at this problem?

Or in the other way, could you share the perf data files?

Base on the perf data files, I think at least we can confirm what's the
AUX trace data size for every period, and we might also can quickly
confirm if the strobing configurations are lost or not in the middle of
tracing (To be honest, I have no knowledge for strobing, so need Mike's
help for confirmation).

Thanks,
Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-18 14:00                 ` Leo Yan
  2021-05-18 14:14                   ` Leo Yan
@ 2021-05-18 15:41                   ` Mathieu Poirier
  2021-05-26  6:47                   ` Denis Nikitin
  2 siblings, 0 replies; 35+ messages in thread
From: Mathieu Poirier @ 2021-05-18 15:41 UTC (permalink / raw)
  To: Leo Yan
  Cc: Denis Nikitin, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

On Tue, May 18, 2021 at 10:00:40PM +0800, Leo Yan wrote:
> Hi Denis,
> 
> On Fri, May 14, 2021 at 02:02:25AM -0700, Denis Nikitin wrote:
> 
> [...]
> 
> > Hi Mattieu and Leo,
> > 
> > I did some evaluation of the snapshot mode.
> 
> Thanks a lot for the evaluation and share back the result.
> 
> > Performance overhead is indeed higher than with ETR polling patch.
> > Here are some numbers for comparison (measured on browser
> > Speedometer2 benchmark):
> > Runtime overhead of ETM tracing with ETR poll period 100ms is less than
> > 0.5%. Snapshot mode gives 2.1%.
> > With 10ms period I see 4.6% with ETR polling and 22% in snapshot mode.
> 
> It's not expected that the snapshot mode causes so big overload.
> In my head, two factors might cause the overload:
> 
> - The perf interaction between the user space and kernel space;
> - The data copying from the ETR's buffer to the AUX ring buffer.
> 
> Check one thing: what's the buffer size for ETR polling mode and for
> snapshot mode in your experiments?
> 
> If I remember correctly, by default the snapshot mode uses 4MB for ETR
> buffer, if copying 4MB per 10ms, then it's likely to cause big
> overload.  So at the first glance, the overhead difference might be
> caused by the by the different buffer size between ETR poll mode and
> snapshot mode.
> 
> > We could probably utilize the ETM strobing feature and reduce frequency
> > of data collection but I see a problem when I'm using both.
> > Within a minute of profiling the ETM generates a reasonable profile size
> > (with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
> > But then the size grows unproportionally.
> > With a 4 minute run I got a 6.3GB profile.
> 
> Just check, as Mathieu has suggested, have you applied the patch [1]
> on your local code base for fixing the data copying for snapshot mode?
> 
> After applied this patch, one possibility for unproportional issue is
> perf tool itself introduces many activities in snapshot mode (especially
> for 10ms period), so the perf tool contributes much extra trace data.
> 
> Another potential issue is: after setting strobing mode, the snapshot
> mode will disable the complete paths for tracers and ETR, so if the
> strobing configuration is lost after re-enable tracers, then it might
> cause the huge trace data in the later phase.  For this case, we
> definitely should fix it.
> 
> > I don't see such a problem with the ETR polling patch.
> > 
> > Leo, could you please take a look at this problem?
> 
> Sure.  For easier reproducing the issue, could you share me the
> detailed commands (and source code)?
> 
> P.s. I saw Mike suggests to continue the ETR polling development, this
> is not conflict with snapshot mode.  At my side, I will investigate the
> snapshot mode, but don't want to disturb the process for ETR polling
> mode, so when the ETR polling patch series is get ready, please go
> ahead for upstreaming the patch series.

Just to clarify my position - I would definitely like to see a solution that
extends or re-use the current snapshot mode rather than introducing a new
mechanism to collect data.

> 
> Thanks,
> Leo
> 
> [1]. https://lists.linaro.org/pipermail/coresight/2021-April/006254.html

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-14  9:02               ` Denis Nikitin
  2021-05-14 16:16                 ` Mike Leach
  2021-05-18 14:00                 ` Leo Yan
@ 2021-05-23  8:45                 ` Leo Yan
  2021-05-27  7:50                   ` Denis Nikitin
  2 siblings, 1 reply; 35+ messages in thread
From: Leo Yan @ 2021-05-23  8:45 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

On Fri, May 14, 2021 at 02:02:25AM -0700, Denis Nikitin wrote:

[...]

> We could probably utilize the ETM strobing feature and reduce frequency
> of data collection but I see a problem when I'm using both.
> Within a minute of profiling the ETM generates a reasonable profile size
> (with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
> But then the size grows unproportionally.
> With a 4 minute run I got a 6.3GB profile.
> I don't see such a problem with the ETR polling patch.

I found there have a potential bug in the perf tool for calculation the
buffer size for snapshot mode.

In the function cs_etm_find_snapshot of perf code [1], after the "head"
has wrapped around, it always return the "old" and "head" with the
difference "mm->len", that means the trace data will be copied with
length "mm->len".  If the buffer size is 4MB, it always copies trace
data with 4MB for every time.

This is incorrect for the snapshot with very small interval, even after
wrapped around, it still have chance to only generate very small amount
trace data (e.g. for 10ms), so I think we should fix this code like:

diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
index d942f118d32c..8a60e65c6651 100644
--- a/tools/perf/arch/arm/util/cs-etm.c
+++ b/tools/perf/arch/arm/util/cs-etm.c
@@ -849,6 +849,13 @@ static int cs_etm_find_snapshot(struct auxtrace_record *itr,
 	if (!wrapped)
 		return 0;
 
+	/*
+	 * If the difference between the head and old is less than mm->len,
+	 * it means the new trace data is small so doesn't need ajust them.
+	 */
+	if (*head - *old < mm->len)
+		return 0;
+
 	/*
 	 * *head has wrapped around - adjust *head and *old to pickup the
 	 * entire content of the AUX buffer.

I will do more testing and send formal patch for this.

Thanks,
Leo

[1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm/util/cs-etm.c#n853

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply related	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-18 14:00                 ` Leo Yan
  2021-05-18 14:14                   ` Leo Yan
  2021-05-18 15:41                   ` Mathieu Poirier
@ 2021-05-26  6:47                   ` Denis Nikitin
  2 siblings, 0 replies; 35+ messages in thread
From: Denis Nikitin @ 2021-05-26  6:47 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

Hi Leo,

Sorry for the delayed reply.

On Tue, May 18, 2021 at 7:00 AM Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Denis,
>

> > Performance overhead is indeed higher than with ETR polling patch.
> > Here are some numbers for comparison (measured on browser
> > Speedometer2 benchmark):
> > Runtime overhead of ETM tracing with ETR poll period 100ms is less than
> > 0.5%. Snapshot mode gives 2.1%.
> > With 10ms period I see 4.6% with ETR polling and 22% in snapshot mode.
>
> It's not expected that the snapshot mode causes so big overload.
> In my head, two factors might cause the overload:
>
> - The perf interaction between the user space and kernel space;
> - The data copying from the ETR's buffer to the AUX ring buffer.
>
> Check one thing: what's the buffer size for ETR polling mode and for
> snapshot mode in your experiments?

AUX buffer size was 64KB in both modes.
I used small buffers to keep the overall perf.data size under 200-500MB
with a long-running perf record (3-5 min).

>
> If I remember correctly, by default the snapshot mode uses 4MB for ETR
> buffer, if copying 4MB per 10ms, then it's likely to cause big
> overload.  So at the first glance, the overhead difference might be
> caused by the by the different buffer size between ETR poll mode and
> snapshot mode.

As I said, the buffer was small. But I can go ahead and check the difference
with a bigger buffer.
I will also double check how strobing affects runtime overhead. It
should be lower.

>
> > We could probably utilize the ETM strobing feature and reduce frequency
> > of data collection but I see a problem when I'm using both.
> > Within a minute of profiling the ETM generates a reasonable profile size
> > (with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
> > But then the size grows unproportionally.
> > With a 4 minute run I got a 6.3GB profile.
>
> Just check, as Mathieu has suggested, have you applied the patch [1]
> on your local code base for fixing the data copying for snapshot mode?

That was my mistake. When I switched to the Strobing patch series
I forgot to apply [1].
When applied I don't see this issue any more.
It's not obvious from the description that the patch would fix my issue.
So it sounds like your patch fixes multiple problems :)

Thanks,
Denis

[...]
>
> Thanks,
> Leo
>
> [1]. https://lists.linaro.org/pipermail/coresight/2021-April/006254.html

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-23  8:45                 ` Leo Yan
@ 2021-05-27  7:50                   ` Denis Nikitin
  2021-05-27 15:07                     ` Leo Yan
  0 siblings, 1 reply; 35+ messages in thread
From: Denis Nikitin @ 2021-05-27  7:50 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

Hi Leo,

On Sun, May 23, 2021 at 1:45 AM Leo Yan <leo.yan@linaro.org> wrote:
>
> On Fri, May 14, 2021 at 02:02:25AM -0700, Denis Nikitin wrote:
>
> [...]
>
> > We could probably utilize the ETM strobing feature and reduce frequency
> > of data collection but I see a problem when I'm using both.
> > Within a minute of profiling the ETM generates a reasonable profile size
> > (with strobing autofdo,preset=9 with period 0x1000 it is up to 20MB).
> > But then the size grows unproportionally.
> > With a 4 minute run I got a 6.3GB profile.
> > I don't see such a problem with the ETR polling patch.
>
> I found there have a potential bug in the perf tool for calculation the
> buffer size for snapshot mode.
>
> In the function cs_etm_find_snapshot of perf code [1], after the "head"
> has wrapped around, it always return the "old" and "head" with the
> difference "mm->len", that means the trace data will be copied with
> length "mm->len".  If the buffer size is 4MB, it always copies trace
> data with 4MB for every time.
>
> This is incorrect for the snapshot with very small interval, even after
> wrapped around, it still have chance to only generate very small amount
> trace data (e.g. for 10ms), so I think we should fix this code like:
>
> diff --git a/tools/perf/arch/arm/util/cs-etm.c b/tools/perf/arch/arm/util/cs-etm.c
> index d942f118d32c..8a60e65c6651 100644
> --- a/tools/perf/arch/arm/util/cs-etm.c
> +++ b/tools/perf/arch/arm/util/cs-etm.c
> @@ -849,6 +849,13 @@ static int cs_etm_find_snapshot(struct auxtrace_record *itr,
>         if (!wrapped)
>                 return 0;
>
> +       /*
> +        * If the difference between the head and old is less than mm->len,
> +        * it means the new trace data is small so doesn't need ajust them.
> +        */
> +       if (*head - *old < mm->len)
> +               return 0;
> +
>         /*
>          * *head has wrapped around - adjust *head and *old to pickup the
>          * entire content of the AUX buffer.
>
> I will do more testing and send formal patch for this.

Thanks for looking into this problem.
Initially I said that the issue with the excessive profile size was
not reproducible
when I applied https://lists.linaro.org/pipermail/coresight/2021-April/006254.html.
But after a couple of runs the issue came back.
Reproducibility depends on the system workload and it happens more often if
we run any workload.

I tried the fix you shared and so far I don't see the issue anymore.

Thanks,
Denis

>
> Thanks,
> Leo
>
> [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm/util/cs-etm.c#n853

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-27  7:50                   ` Denis Nikitin
@ 2021-05-27 15:07                     ` Leo Yan
  2021-05-27 16:22                       ` Denis Nikitin
  0 siblings, 1 reply; 35+ messages in thread
From: Leo Yan @ 2021-05-27 15:07 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

Hi Denis,

On Thu, May 27, 2021 at 12:50:28AM -0700, Denis Nikitin wrote:

[...]

> Thanks for looking into this problem.
> Initially I said that the issue with the excessive profile size was
> not reproducible
> when I applied https://lists.linaro.org/pipermail/coresight/2021-April/006254.html.
> But after a couple of runs the issue came back.
> Reproducibility depends on the system workload and it happens more often if
> we run any workload.

Thanks a lot for the testing.

> I tried the fix you shared and so far I don't see the issue anymore.

Could you confirm here you mentioend "the fix" is the same one with [1]?

If these two fixes (one is for kernel driver and another is in the
perf tool) can resolve your issue, this is great!  I will resend the
patches.  Sorry for some delay at my side.

Thanks,
Leo

[1] https://lists.linaro.org/pipermail/coresight/2021-May/006411.html

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-27 15:07                     ` Leo Yan
@ 2021-05-27 16:22                       ` Denis Nikitin
  2021-05-28 16:37                         ` Leo Yan
  0 siblings, 1 reply; 35+ messages in thread
From: Denis Nikitin @ 2021-05-27 16:22 UTC (permalink / raw)
  To: Leo Yan
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

On Thu, May 27, 2021 at 8:07 AM Leo Yan <leo.yan@linaro.org> wrote:
>
> Hi Denis,
>
> On Thu, May 27, 2021 at 12:50:28AM -0700, Denis Nikitin wrote:
>
> [...]
>
> > Thanks for looking into this problem.
> > Initially I said that the issue with the excessive profile size was
> > not reproducible
> > when I applied https://lists.linaro.org/pipermail/coresight/2021-April/006254.html.
> > But after a couple of runs the issue came back.
> > Reproducibility depends on the system workload and it happens more often if
> > we run any workload.
>
> Thanks a lot for the testing.
>
> > I tried the fix you shared and so far I don't see the issue anymore.
>
> Could you confirm here you mentioend "the fix" is the same one with [1]?

Correct.

>
> If these two fixes (one is for kernel driver and another is in the
> perf tool) can resolve your issue, this is great!  I will resend the
> patches.  Sorry for some delay at my side.

Sounds great. Thanks, Leo!
I will take a look at runtime overhead meanwhile.

- Denis

>
> Thanks,
> Leo
>
> [1] https://lists.linaro.org/pipermail/coresight/2021-May/006411.html

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

* Re: [PATCH 0/4] coresight: Add ETR-PERF polling.
  2021-05-27 16:22                       ` Denis Nikitin
@ 2021-05-28 16:37                         ` Leo Yan
  0 siblings, 0 replies; 35+ messages in thread
From: Leo Yan @ 2021-05-28 16:37 UTC (permalink / raw)
  To: Denis Nikitin
  Cc: Mathieu Poirier, Mike Leach, coresight, linux-arm-kernel, Daniel Kiss

Hi Denis,

On Thu, May 27, 2021 at 09:22:45AM -0700, Denis Nikitin wrote:

[...]

> > If these two fixes (one is for kernel driver and another is in the
> > perf tool) can resolve your issue, this is great!  I will resend the
> > patches.  Sorry for some delay at my side.
> 
> Sounds great. Thanks, Leo!

Welcome!

> I will take a look at runtime overhead meanwhile.

Just info, I have sent out the patch series to address snapshot
issues: https://lore.kernel.org/patchwork/cover/1437696/

It would be appreciated if you could verify these patches, thanks!

Leo

_______________________________________________
linux-arm-kernel mailing list
linux-arm-kernel@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

^ permalink raw reply	[flat|nested] 35+ messages in thread

end of thread, other threads:[~2021-05-28 16:39 UTC | newest]

Thread overview: 35+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-04-21 12:04 [PATCH 0/4] coresight: Add ETR-PERF polling Daniel Kiss
2021-04-21 12:04 ` [PATCH 1/4] coresight: tmc-etr: Advance buffer pointer in sync buffer Daniel Kiss
2021-04-23  8:23   ` Leo Yan
2021-04-26 10:40   ` Suzuki K Poulose
2021-04-27  3:45     ` Leo Yan
2021-04-27 10:00       ` Suzuki K Poulose
2021-04-28  2:34         ` Leo Yan
2021-04-21 12:04 ` [PATCH 2/4] coresight: tmc-etr: Track perf handler Daniel Kiss
2021-04-23  9:20   ` Leo Yan
2021-04-26  0:25     ` Leo Yan
2021-04-21 12:04 ` [PATCH 3/4] coresight: etm-perf: Export etm_event_cpu_path Daniel Kiss
2021-04-21 12:04 ` [PATCH 4/4] coresight: Add ETR-PERF polling Daniel Kiss
2021-04-26  1:18   ` Leo Yan
2021-05-05  7:21   ` Denis Nikitin
2021-04-26 17:54 ` [PATCH 0/4] " Mathieu Poirier
2021-04-27 10:43   ` Al Grant
2021-04-27 14:41     ` Mike Leach
2021-04-27 15:47       ` Mathieu Poirier
2021-04-27 16:04         ` Leo Yan
2021-05-05  6:46           ` Denis Nikitin
2021-05-05 15:29             ` Mathieu Poirier
2021-05-14  9:02               ` Denis Nikitin
2021-05-14 16:16                 ` Mike Leach
2021-05-18 14:00                 ` Leo Yan
2021-05-18 14:14                   ` Leo Yan
2021-05-18 15:41                   ` Mathieu Poirier
2021-05-26  6:47                   ` Denis Nikitin
2021-05-23  8:45                 ` Leo Yan
2021-05-27  7:50                   ` Denis Nikitin
2021-05-27 15:07                     ` Leo Yan
2021-05-27 16:22                       ` Denis Nikitin
2021-05-28 16:37                         ` Leo Yan
2021-04-27 16:24 ` James Clark
2021-04-28 11:30   ` James Clark
2021-04-28 11:52   ` Daniel Kiss

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).