linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] Add CCPI2 PMU support
@ 2019-06-14 17:42 Ganapatrao Kulkarni
  2019-06-14 17:42 ` [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver Ganapatrao Kulkarni
  2019-06-14 17:42 ` [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver Ganapatrao Kulkarni
  0 siblings, 2 replies; 9+ messages in thread
From: Ganapatrao Kulkarni @ 2019-06-14 17:42 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-arm-kernel
  Cc: Will.Deacon, mark.rutland, corbet, jnair, Robert.Richter,
	Jan.Glauber, gklkml16

Add Cavium Coherent Processor Interconnect (CCPI2) PMU
support in ThunderX2 Uncore driver.

Ganapatrao Kulkarni (2):
  Documentation: perf: Update documentation for ThunderX2 PMU uncore
    driver
  drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.

 Documentation/perf/thunderx2-pmu.txt |  20 +--
 drivers/perf/thunderx2_pmu.c         | 179 +++++++++++++++++++++++----
 2 files changed, 168 insertions(+), 31 deletions(-)

-- 
2.17.1


^ permalink raw reply	[flat|nested] 9+ messages in thread

* [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver
  2019-06-14 17:42 [PATCH 0/2] Add CCPI2 PMU support Ganapatrao Kulkarni
@ 2019-06-14 17:42 ` Ganapatrao Kulkarni
  2019-06-27 10:01   ` Will Deacon
  2019-06-14 17:42 ` [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver Ganapatrao Kulkarni
  1 sibling, 1 reply; 9+ messages in thread
From: Ganapatrao Kulkarni @ 2019-06-14 17:42 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-arm-kernel
  Cc: Will.Deacon, mark.rutland, corbet, jnair, Robert.Richter,
	Jan.Glauber, gklkml16

From: Ganapatrao Kulkarni <ganapatrao.kulkarni@marvell.com>

Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU.

Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
---
 Documentation/perf/thunderx2-pmu.txt | 20 +++++++++++---------
 1 file changed, 11 insertions(+), 9 deletions(-)

diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt
index dffc57143736..62243230abc3 100644
--- a/Documentation/perf/thunderx2-pmu.txt
+++ b/Documentation/perf/thunderx2-pmu.txt
@@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
 =============================================================
 
 The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
-PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
+PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and
+Cavium Coherent Processor Interconnect (CCPI2).
 
 The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
 Events are counted for the default channel (i.e. channel 0) and prorated
 to the total number of channels/tiles.
 
-The DMC and L3C support up to 4 counters. Counters are independently
-programmable and can be started and stopped individually. Each counter
-can be set to a different event. Counters are 32-bit and do not support
-an overflow interrupt; they are read every 2 seconds.
+The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters.
+Counters are independently programmable and can be started and stopped
+individually. Each counter can be set to a different event. DMC and L3C
+Counters are 32-bit and do not support an overflow interrupt; they are read
+every 2 seconds. CCPI2 counters are 64-bit.
 
 PMU UNCORE (perf) driver:
 
 The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and
-L3C devices.  Each PMU can be used to count up to 4 events
-simultaneously. The PMUs provide a description of their available events
-and configuration options under sysfs, see
-/sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
+L3C devices.  Each PMU can be used to count up to 4(DMC/L3C) or up to 8
+(CCPI2) events simultaneously. The PMUs provide a description of their
+available events and configuration options under sysfs, see
+/sys/devices/uncore_<l3c_S/dmc_S/ccpi2_S/>; S is the socket id.
 
 The driver does not support sampling, therefore "perf record" will not
 work. Per-task perf sessions are also not supported.
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
  2019-06-14 17:42 [PATCH 0/2] Add CCPI2 PMU support Ganapatrao Kulkarni
  2019-06-14 17:42 ` [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver Ganapatrao Kulkarni
@ 2019-06-14 17:42 ` Ganapatrao Kulkarni
  2019-06-27  9:57   ` Will Deacon
  1 sibling, 1 reply; 9+ messages in thread
From: Ganapatrao Kulkarni @ 2019-06-14 17:42 UTC (permalink / raw)
  To: linux-doc, linux-kernel, linux-arm-kernel
  Cc: Will.Deacon, mark.rutland, corbet, jnair, Robert.Richter,
	Jan.Glauber, gklkml16

CCPI2 is a low-latency high-bandwidth serial interface for connecting
ThunderX2 processors. This patch adds support to capture CCPI2 perf events.

Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
---
 drivers/perf/thunderx2_pmu.c | 179 ++++++++++++++++++++++++++++++-----
 1 file changed, 157 insertions(+), 22 deletions(-)

diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
index 43d76c85da56..3791ac9b897d 100644
--- a/drivers/perf/thunderx2_pmu.c
+++ b/drivers/perf/thunderx2_pmu.c
@@ -16,7 +16,9 @@
  * they need to be sampled before overflow(i.e, at every 2 seconds).
  */
 
-#define TX2_PMU_MAX_COUNTERS		4
+#define TX2_PMU_DMC_L3C_MAX_COUNTERS	4
+#define TX2_PMU_CCPI2_MAX_COUNTERS	8
+
 #define TX2_PMU_DMC_CHANNELS		8
 #define TX2_PMU_L3_TILES		16
 
@@ -28,11 +30,22 @@
   */
 #define DMC_EVENT_CFG(idx, val)		((val) << (((idx) * 8) + 1))
 
+#define GET_EVENTID_CCPI2(ev)		((ev->hw.config) & 0x1ff)
+/* bits[3:0] to select counters, starts from 8, bit[3] set always. */
+#define GET_COUNTERID_CCPI2(ev)		((ev->hw.idx) & 0x7)
+#define CCPI2_COUNTER_OFFSET		8
+
 #define L3C_COUNTER_CTL			0xA8
 #define L3C_COUNTER_DATA		0xAC
 #define DMC_COUNTER_CTL			0x234
 #define DMC_COUNTER_DATA		0x240
 
+#define CCPI2_PERF_CTL			0x108
+#define CCPI2_COUNTER_CTL		0x10C
+#define CCPI2_COUNTER_SEL		0x12c
+#define CCPI2_COUNTER_DATA_L		0x130
+#define CCPI2_COUNTER_DATA_H		0x134
+
 /* L3C event IDs */
 #define L3_EVENT_READ_REQ		0xD
 #define L3_EVENT_WRITEBACK_REQ		0xE
@@ -51,9 +64,16 @@
 #define DMC_EVENT_READ_TXNS		0xF
 #define DMC_EVENT_MAX			0x10
 
+#define CCPI2_EVENT_REQ_PKT_SENT	0x3D
+#define CCPI2_EVENT_SNOOP_PKT_SENT	0x65
+#define CCPI2_EVENT_DATA_PKT_SENT	0x105
+#define CCPI2_EVENT_GIC_PKT_SENT	0x12D
+#define CCPI2_EVENT_MAX			0x200
+
 enum tx2_uncore_type {
 	PMU_TYPE_L3C,
 	PMU_TYPE_DMC,
+	PMU_TYPE_CCPI2,
 	PMU_TYPE_INVALID,
 };
 
@@ -73,8 +93,8 @@ struct tx2_uncore_pmu {
 	u32 max_events;
 	u64 hrtimer_interval;
 	void __iomem *base;
-	DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS);
-	struct perf_event *events[TX2_PMU_MAX_COUNTERS];
+	DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS);
+	struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS];
 	struct device *dev;
 	struct hrtimer hrtimer;
 	const struct attribute_group **attr_groups;
@@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
 	return container_of(pmu, struct tx2_uncore_pmu, pmu);
 }
 
-PMU_FORMAT_ATTR(event,	"config:0-4");
+#define TX2_PMU_FORMAT_ATTR(_var, _name, _format)			\
+static ssize_t								\
+__tx2_pmu_##_var##_show(struct device *dev,				\
+			       struct device_attribute *attr,		\
+			       char *page)				\
+{									\
+	BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE);			\
+	return sprintf(page, _format "\n");				\
+}									\
+									\
+static struct device_attribute format_attr_##_var =			\
+	__ATTR(_name, 0444, __tx2_pmu_##_var##_show, NULL)
+
+TX2_PMU_FORMAT_ATTR(event, event, "config:0-4");
+TX2_PMU_FORMAT_ATTR(event_ccpi2, event, "config:0-9");
 
 static struct attribute *l3c_pmu_format_attrs[] = {
 	&format_attr_event.attr,
@@ -104,6 +138,11 @@ static struct attribute *dmc_pmu_format_attrs[] = {
 	NULL,
 };
 
+static struct attribute *ccpi2_pmu_format_attrs[] = {
+	&format_attr_event_ccpi2.attr,
+	NULL,
+};
+
 static const struct attribute_group l3c_pmu_format_attr_group = {
 	.name = "format",
 	.attrs = l3c_pmu_format_attrs,
@@ -114,6 +153,11 @@ static const struct attribute_group dmc_pmu_format_attr_group = {
 	.attrs = dmc_pmu_format_attrs,
 };
 
+static const struct attribute_group ccpi2_pmu_format_attr_group = {
+	.name = "format",
+	.attrs = ccpi2_pmu_format_attrs,
+};
+
 /*
  * sysfs event attributes
  */
@@ -164,6 +208,19 @@ static struct attribute *dmc_pmu_events_attrs[] = {
 	NULL,
 };
 
+TX2_EVENT_ATTR(req_pktsent, CCPI2_EVENT_REQ_PKT_SENT);
+TX2_EVENT_ATTR(snoop_pktsent, CCPI2_EVENT_SNOOP_PKT_SENT);
+TX2_EVENT_ATTR(data_pktsent, CCPI2_EVENT_DATA_PKT_SENT);
+TX2_EVENT_ATTR(gic_pktsent, CCPI2_EVENT_GIC_PKT_SENT);
+
+static struct attribute *ccpi2_pmu_events_attrs[] = {
+	&tx2_pmu_event_attr_req_pktsent.attr.attr,
+	&tx2_pmu_event_attr_snoop_pktsent.attr.attr,
+	&tx2_pmu_event_attr_data_pktsent.attr.attr,
+	&tx2_pmu_event_attr_gic_pktsent.attr.attr,
+	NULL,
+};
+
 static const struct attribute_group l3c_pmu_events_attr_group = {
 	.name = "events",
 	.attrs = l3c_pmu_events_attrs,
@@ -174,6 +231,11 @@ static const struct attribute_group dmc_pmu_events_attr_group = {
 	.attrs = dmc_pmu_events_attrs,
 };
 
+static const struct attribute_group ccpi2_pmu_events_attr_group = {
+	.name = "events",
+	.attrs = ccpi2_pmu_events_attrs,
+};
+
 /*
  * sysfs cpumask attributes
  */
@@ -213,6 +275,13 @@ static const struct attribute_group *dmc_pmu_attr_groups[] = {
 	NULL
 };
 
+static const struct attribute_group *ccpi2_pmu_attr_groups[] = {
+	&ccpi2_pmu_format_attr_group,
+	&pmu_cpumask_attr_group,
+	&ccpi2_pmu_events_attr_group,
+	NULL
+};
+
 static inline u32 reg_readl(unsigned long addr)
 {
 	return readl((void __iomem *)addr);
@@ -265,6 +334,17 @@ static void init_cntr_base_dmc(struct perf_event *event,
 		+ DMC_COUNTER_DATA + (0xc * GET_COUNTERID(event));
 }
 
+static void init_cntr_base_ccpi2(struct perf_event *event,
+		struct tx2_uncore_pmu *tx2_pmu)
+{
+
+	struct hw_perf_event *hwc = &event->hw;
+
+	hwc->config_base = (unsigned long)tx2_pmu->base
+		+ CCPI2_COUNTER_CTL + (4 * GET_COUNTERID_CCPI2(event));
+	hwc->event_base =  (unsigned long)tx2_pmu->base;
+}
+
 static void uncore_start_event_l3c(struct perf_event *event, int flags)
 {
 	u32 val;
@@ -312,6 +392,29 @@ static void uncore_stop_event_dmc(struct perf_event *event)
 	reg_writel(val, hwc->config_base);
 }
 
+static void uncore_start_event_ccpi2(struct perf_event *event, int flags)
+{
+	u32 val;
+	struct hw_perf_event *hwc = &event->hw;
+
+	/* Bit [09:00] to set event id, set level and type to 1 */
+	val = reg_readl(hwc->config_base);
+	reg_writel((val & ~0xFFF) | (3 << 10) |
+			GET_EVENTID_CCPI2(event), hwc->config_base);
+	/* reset[4], enable[0] and start[1] counters */
+	reg_writel(0x13, hwc->event_base + CCPI2_PERF_CTL);
+	local64_set(&event->hw.prev_count, 0ULL);
+}
+
+static void uncore_stop_event_ccpi2(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	/* disable and stop counter */
+	reg_writel(0, hwc->event_base + CCPI2_PERF_CTL);
+	reg_writel(0, hwc->config_base);
+}
+
 static void tx2_uncore_event_update(struct perf_event *event)
 {
 	s64 prev, delta, new = 0;
@@ -323,12 +426,20 @@ static void tx2_uncore_event_update(struct perf_event *event)
 	tx2_pmu = pmu_to_tx2_pmu(event->pmu);
 	type = tx2_pmu->type;
 	prorate_factor = tx2_pmu->prorate_factor;
-
-	new = reg_readl(hwc->event_base);
-	prev = local64_xchg(&hwc->prev_count, new);
-
-	/* handles rollover of 32 bit counter */
-	delta = (u32)(((1UL << 32) - prev) + new);
+	if (type == PMU_TYPE_CCPI2) {
+		reg_writel(CCPI2_COUNTER_OFFSET + GET_COUNTERID_CCPI2(event),
+					hwc->event_base + CCPI2_COUNTER_SEL);
+		new = reg_readl(hwc->event_base + CCPI2_COUNTER_DATA_L);
+		new |= (u64)reg_readl(hwc->event_base +
+				CCPI2_COUNTER_DATA_H) << 32;
+		prev = local64_xchg(&hwc->prev_count, new);
+		delta = new - prev;
+	} else {
+		new = reg_readl(hwc->event_base);
+		prev = local64_xchg(&hwc->prev_count, new);
+		/* handles rollover of 32 bit counter */
+		delta = (u32)(((1UL << 32) - prev) + new);
+	}
 
 	/* DMC event data_transfers granularity is 16 Bytes, convert it to 64 */
 	if (type == PMU_TYPE_DMC &&
@@ -351,6 +462,7 @@ static enum tx2_uncore_type get_tx2_pmu_type(struct acpi_device *adev)
 	} devices[] = {
 		{"CAV901D", PMU_TYPE_L3C},
 		{"CAV901F", PMU_TYPE_DMC},
+		{"CAV901E", PMU_TYPE_CCPI2},
 		{"", PMU_TYPE_INVALID}
 	};
 
@@ -380,7 +492,8 @@ static bool tx2_uncore_validate_event(struct pmu *pmu,
  * Make sure the group of events can be scheduled at once
  * on the PMU.
  */
-static bool tx2_uncore_validate_event_group(struct perf_event *event)
+static bool tx2_uncore_validate_event_group(struct perf_event *event,
+		int max_counters)
 {
 	struct perf_event *sibling, *leader = event->group_leader;
 	int counters = 0;
@@ -403,7 +516,7 @@ static bool tx2_uncore_validate_event_group(struct perf_event *event)
 	 * If the group requires more counters than the HW has,
 	 * it cannot ever be scheduled.
 	 */
-	return counters <= TX2_PMU_MAX_COUNTERS;
+	return counters <= max_counters;
 }
 
 
@@ -439,7 +552,7 @@ static int tx2_uncore_event_init(struct perf_event *event)
 	hwc->config = event->attr.config;
 
 	/* Validate the group */
-	if (!tx2_uncore_validate_event_group(event))
+	if (!tx2_uncore_validate_event_group(event, tx2_pmu->max_counters))
 		return -EINVAL;
 
 	return 0;
@@ -457,7 +570,8 @@ static void tx2_uncore_event_start(struct perf_event *event, int flags)
 	perf_event_update_userpage(event);
 
 	/* Start timer for first event */
-	if (bitmap_weight(tx2_pmu->active_counters,
+	if (tx2_pmu->type != PMU_TYPE_CCPI2 &&
+			bitmap_weight(tx2_pmu->active_counters,
 				tx2_pmu->max_counters) == 1) {
 		hrtimer_start(&tx2_pmu->hrtimer,
 			ns_to_ktime(tx2_pmu->hrtimer_interval),
@@ -495,7 +609,8 @@ static int tx2_uncore_event_add(struct perf_event *event, int flags)
 	if (hwc->idx < 0)
 		return -EAGAIN;
 
-	tx2_pmu->events[hwc->idx] = event;
+	if (tx2_pmu->type != PMU_TYPE_CCPI2)
+		tx2_pmu->events[hwc->idx] = event;
 	/* set counter control and data registers base address */
 	tx2_pmu->init_cntr_base(event, tx2_pmu);
 
@@ -514,10 +629,14 @@ static void tx2_uncore_event_del(struct perf_event *event, int flags)
 	tx2_uncore_event_stop(event, PERF_EF_UPDATE);
 
 	/* clear the assigned counter */
-	free_counter(tx2_pmu, GET_COUNTERID(event));
+	if (tx2_pmu->type == PMU_TYPE_CCPI2)
+		free_counter(tx2_pmu, GET_COUNTERID_CCPI2(event));
+	else
+		free_counter(tx2_pmu, GET_COUNTERID(event));
 
 	perf_event_update_userpage(event);
-	tx2_pmu->events[hwc->idx] = NULL;
+	if (tx2_pmu->type != PMU_TYPE_CCPI2)
+		tx2_pmu->events[hwc->idx] = NULL;
 	hwc->idx = -1;
 }
 
@@ -580,8 +699,12 @@ static int tx2_uncore_pmu_add_dev(struct tx2_uncore_pmu *tx2_pmu)
 			cpu_online_mask);
 
 	tx2_pmu->cpu = cpu;
-	hrtimer_init(&tx2_pmu->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
-	tx2_pmu->hrtimer.function = tx2_hrtimer_callback;
+	/* CCPI2 counters are 64 bit counters. */
+	if (tx2_pmu->type != PMU_TYPE_CCPI2) {
+		hrtimer_init(&tx2_pmu->hrtimer,
+				CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+		tx2_pmu->hrtimer.function = tx2_hrtimer_callback;
+	}
 
 	ret = tx2_uncore_pmu_register(tx2_pmu);
 	if (ret) {
@@ -653,7 +776,7 @@ static struct tx2_uncore_pmu *tx2_uncore_pmu_init_dev(struct device *dev,
 
 	switch (tx2_pmu->type) {
 	case PMU_TYPE_L3C:
-		tx2_pmu->max_counters = TX2_PMU_MAX_COUNTERS;
+		tx2_pmu->max_counters = TX2_PMU_DMC_L3C_MAX_COUNTERS;
 		tx2_pmu->prorate_factor = TX2_PMU_L3_TILES;
 		tx2_pmu->max_events = L3_EVENT_MAX;
 		tx2_pmu->hrtimer_interval = TX2_PMU_HRTIMER_INTERVAL;
@@ -665,7 +788,7 @@ static struct tx2_uncore_pmu *tx2_uncore_pmu_init_dev(struct device *dev,
 		tx2_pmu->stop_event = uncore_stop_event_l3c;
 		break;
 	case PMU_TYPE_DMC:
-		tx2_pmu->max_counters = TX2_PMU_MAX_COUNTERS;
+		tx2_pmu->max_counters = TX2_PMU_DMC_L3C_MAX_COUNTERS;
 		tx2_pmu->prorate_factor = TX2_PMU_DMC_CHANNELS;
 		tx2_pmu->max_events = DMC_EVENT_MAX;
 		tx2_pmu->hrtimer_interval = TX2_PMU_HRTIMER_INTERVAL;
@@ -676,6 +799,17 @@ static struct tx2_uncore_pmu *tx2_uncore_pmu_init_dev(struct device *dev,
 		tx2_pmu->start_event = uncore_start_event_dmc;
 		tx2_pmu->stop_event = uncore_stop_event_dmc;
 		break;
+	case PMU_TYPE_CCPI2:
+		tx2_pmu->max_counters = TX2_PMU_CCPI2_MAX_COUNTERS;
+		tx2_pmu->prorate_factor = 1;
+		tx2_pmu->max_events = CCPI2_EVENT_MAX;
+		tx2_pmu->attr_groups = ccpi2_pmu_attr_groups;
+		tx2_pmu->name = devm_kasprintf(dev, GFP_KERNEL,
+				"uncore_ccpi2_%d", tx2_pmu->node);
+		tx2_pmu->init_cntr_base = init_cntr_base_ccpi2;
+		tx2_pmu->start_event = uncore_start_event_ccpi2;
+		tx2_pmu->stop_event = uncore_stop_event_ccpi2;
+		break;
 	case PMU_TYPE_INVALID:
 		devm_kfree(dev, tx2_pmu);
 		return NULL;
@@ -744,7 +878,8 @@ static int tx2_uncore_pmu_offline_cpu(unsigned int cpu,
 	if (cpu != tx2_pmu->cpu)
 		return 0;
 
-	hrtimer_cancel(&tx2_pmu->hrtimer);
+	if (tx2_pmu->type != PMU_TYPE_CCPI2)
+		hrtimer_cancel(&tx2_pmu->hrtimer);
 	cpumask_copy(&cpu_online_mask_temp, cpu_online_mask);
 	cpumask_clear_cpu(cpu, &cpu_online_mask_temp);
 	new_cpu = cpumask_any_and(
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
  2019-06-14 17:42 ` [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver Ganapatrao Kulkarni
@ 2019-06-27  9:57   ` Will Deacon
  2019-06-28  5:39     ` Ganapatrao Kulkarni
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2019-06-27  9:57 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-doc, linux-kernel, linux-arm-kernel, Will.Deacon,
	mark.rutland, corbet, jnair, Robert.Richter, Jan.Glauber,
	gklkml16

Hi Ganapat,

On Fri, Jun 14, 2019 at 05:42:46PM +0000, Ganapatrao Kulkarni wrote:
> CCPI2 is a low-latency high-bandwidth serial interface for connecting
> ThunderX2 processors. This patch adds support to capture CCPI2 perf events.
> 
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
> ---
>  drivers/perf/thunderx2_pmu.c | 179 ++++++++++++++++++++++++++++++-----
>  1 file changed, 157 insertions(+), 22 deletions(-)
> 
> diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
> index 43d76c85da56..3791ac9b897d 100644
> --- a/drivers/perf/thunderx2_pmu.c
> +++ b/drivers/perf/thunderx2_pmu.c
> @@ -16,7 +16,9 @@
>   * they need to be sampled before overflow(i.e, at every 2 seconds).
>   */
>  
> -#define TX2_PMU_MAX_COUNTERS		4
> +#define TX2_PMU_DMC_L3C_MAX_COUNTERS	4

I find it odd that you rename this...

> +#define TX2_PMU_CCPI2_MAX_COUNTERS	8
> +
>  #define TX2_PMU_DMC_CHANNELS		8
>  #define TX2_PMU_L3_TILES		16
>  
> @@ -28,11 +30,22 @@
>    */
>  #define DMC_EVENT_CFG(idx, val)		((val) << (((idx) * 8) + 1))
>  
> +#define GET_EVENTID_CCPI2(ev)		((ev->hw.config) & 0x1ff)
> +/* bits[3:0] to select counters, starts from 8, bit[3] set always. */
> +#define GET_COUNTERID_CCPI2(ev)		((ev->hw.idx) & 0x7)
> +#define CCPI2_COUNTER_OFFSET		8


... but leave GET_EVENTID alone, even though it only applies to DMC/L3C
events. Saying that, if you have the event you can figure out its type,
so could you avoid the need for additional macros entirely and just use
the correct masks based on the corresponding PMU type? That might also
avoid some of the conditional control flow you're introducing elsewhere.

>  #define L3C_COUNTER_CTL			0xA8
>  #define L3C_COUNTER_DATA		0xAC
>  #define DMC_COUNTER_CTL			0x234
>  #define DMC_COUNTER_DATA		0x240
>  
> +#define CCPI2_PERF_CTL			0x108
> +#define CCPI2_COUNTER_CTL		0x10C
> +#define CCPI2_COUNTER_SEL		0x12c
> +#define CCPI2_COUNTER_DATA_L		0x130
> +#define CCPI2_COUNTER_DATA_H		0x134
> +
>  /* L3C event IDs */
>  #define L3_EVENT_READ_REQ		0xD
>  #define L3_EVENT_WRITEBACK_REQ		0xE
> @@ -51,9 +64,16 @@
>  #define DMC_EVENT_READ_TXNS		0xF
>  #define DMC_EVENT_MAX			0x10
>  
> +#define CCPI2_EVENT_REQ_PKT_SENT	0x3D
> +#define CCPI2_EVENT_SNOOP_PKT_SENT	0x65
> +#define CCPI2_EVENT_DATA_PKT_SENT	0x105
> +#define CCPI2_EVENT_GIC_PKT_SENT	0x12D
> +#define CCPI2_EVENT_MAX			0x200
> +
>  enum tx2_uncore_type {
>  	PMU_TYPE_L3C,
>  	PMU_TYPE_DMC,
> +	PMU_TYPE_CCPI2,
>  	PMU_TYPE_INVALID,
>  };
>  
> @@ -73,8 +93,8 @@ struct tx2_uncore_pmu {
>  	u32 max_events;
>  	u64 hrtimer_interval;
>  	void __iomem *base;
> -	DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS);
> -	struct perf_event *events[TX2_PMU_MAX_COUNTERS];
> +	DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS);
> +	struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS];

Hmm, that looks very odd. Why are they now different sizes?

>  	struct device *dev;
>  	struct hrtimer hrtimer;
>  	const struct attribute_group **attr_groups;
> @@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
>  	return container_of(pmu, struct tx2_uncore_pmu, pmu);
>  }
>  
> -PMU_FORMAT_ATTR(event,	"config:0-4");
> +#define TX2_PMU_FORMAT_ATTR(_var, _name, _format)			\
> +static ssize_t								\
> +__tx2_pmu_##_var##_show(struct device *dev,				\
> +			       struct device_attribute *attr,		\
> +			       char *page)				\
> +{									\
> +	BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE);			\
> +	return sprintf(page, _format "\n");				\
> +}									\
> +									\
> +static struct device_attribute format_attr_##_var =			\
> +	__ATTR(_name, 0444, __tx2_pmu_##_var##_show, NULL)
> +
> +TX2_PMU_FORMAT_ATTR(event, event, "config:0-4");
> +TX2_PMU_FORMAT_ATTR(event_ccpi2, event, "config:0-9");

This doesn't look right. Can perf deal with overlapping fields? It seems
wrong that we'd allow the user to specify both, for example.

>  
>  static struct attribute *l3c_pmu_format_attrs[] = {
>  	&format_attr_event.attr,
> @@ -104,6 +138,11 @@ static struct attribute *dmc_pmu_format_attrs[] = {
>  	NULL,
>  };
>  
> +static struct attribute *ccpi2_pmu_format_attrs[] = {
> +	&format_attr_event_ccpi2.attr,
> +	NULL,
> +};
> +
>  static const struct attribute_group l3c_pmu_format_attr_group = {
>  	.name = "format",
>  	.attrs = l3c_pmu_format_attrs,
> @@ -114,6 +153,11 @@ static const struct attribute_group dmc_pmu_format_attr_group = {
>  	.attrs = dmc_pmu_format_attrs,
>  };
>  
> +static const struct attribute_group ccpi2_pmu_format_attr_group = {
> +	.name = "format",
> +	.attrs = ccpi2_pmu_format_attrs,
> +};
> +
>  /*
>   * sysfs event attributes
>   */
> @@ -164,6 +208,19 @@ static struct attribute *dmc_pmu_events_attrs[] = {
>  	NULL,
>  };
>  
> +TX2_EVENT_ATTR(req_pktsent, CCPI2_EVENT_REQ_PKT_SENT);
> +TX2_EVENT_ATTR(snoop_pktsent, CCPI2_EVENT_SNOOP_PKT_SENT);
> +TX2_EVENT_ATTR(data_pktsent, CCPI2_EVENT_DATA_PKT_SENT);
> +TX2_EVENT_ATTR(gic_pktsent, CCPI2_EVENT_GIC_PKT_SENT);
> +
> +static struct attribute *ccpi2_pmu_events_attrs[] = {
> +	&tx2_pmu_event_attr_req_pktsent.attr.attr,
> +	&tx2_pmu_event_attr_snoop_pktsent.attr.attr,
> +	&tx2_pmu_event_attr_data_pktsent.attr.attr,
> +	&tx2_pmu_event_attr_gic_pktsent.attr.attr,
> +	NULL,
> +};
> +
>  static const struct attribute_group l3c_pmu_events_attr_group = {
>  	.name = "events",
>  	.attrs = l3c_pmu_events_attrs,
> @@ -174,6 +231,11 @@ static const struct attribute_group dmc_pmu_events_attr_group = {
>  	.attrs = dmc_pmu_events_attrs,
>  };
>  
> +static const struct attribute_group ccpi2_pmu_events_attr_group = {
> +	.name = "events",
> +	.attrs = ccpi2_pmu_events_attrs,
> +};
> +
>  /*
>   * sysfs cpumask attributes
>   */
> @@ -213,6 +275,13 @@ static const struct attribute_group *dmc_pmu_attr_groups[] = {
>  	NULL
>  };
>  
> +static const struct attribute_group *ccpi2_pmu_attr_groups[] = {
> +	&ccpi2_pmu_format_attr_group,
> +	&pmu_cpumask_attr_group,
> +	&ccpi2_pmu_events_attr_group,
> +	NULL
> +};
> +
>  static inline u32 reg_readl(unsigned long addr)
>  {
>  	return readl((void __iomem *)addr);
> @@ -265,6 +334,17 @@ static void init_cntr_base_dmc(struct perf_event *event,
>  		+ DMC_COUNTER_DATA + (0xc * GET_COUNTERID(event));
>  }
>  
> +static void init_cntr_base_ccpi2(struct perf_event *event,
> +		struct tx2_uncore_pmu *tx2_pmu)
> +{
> +
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	hwc->config_base = (unsigned long)tx2_pmu->base
> +		+ CCPI2_COUNTER_CTL + (4 * GET_COUNTERID_CCPI2(event));
> +	hwc->event_base =  (unsigned long)tx2_pmu->base;
> +}
> +
>  static void uncore_start_event_l3c(struct perf_event *event, int flags)
>  {
>  	u32 val;
> @@ -312,6 +392,29 @@ static void uncore_stop_event_dmc(struct perf_event *event)
>  	reg_writel(val, hwc->config_base);
>  }
>  
> +static void uncore_start_event_ccpi2(struct perf_event *event, int flags)
> +{
> +	u32 val;
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	/* Bit [09:00] to set event id, set level and type to 1 */
> +	val = reg_readl(hwc->config_base);
> +	reg_writel((val & ~0xFFF) | (3 << 10) |
> +			GET_EVENTID_CCPI2(event), hwc->config_base);
> +	/* reset[4], enable[0] and start[1] counters */
> +	reg_writel(0x13, hwc->event_base + CCPI2_PERF_CTL);
> +	local64_set(&event->hw.prev_count, 0ULL);
> +}
> +
> +static void uncore_stop_event_ccpi2(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	/* disable and stop counter */
> +	reg_writel(0, hwc->event_base + CCPI2_PERF_CTL);

How come you need to clear the event register here? You don't do that for
the DMC/L3C paths.

> +	reg_writel(0, hwc->config_base);

When starting event you're careful to update this using a read-modify-write
sequence. Why is it safe to zero the whole thing when stopping?

> +}
> +
>  static void tx2_uncore_event_update(struct perf_event *event)
>  {
>  	s64 prev, delta, new = 0;
> @@ -323,12 +426,20 @@ static void tx2_uncore_event_update(struct perf_event *event)
>  	tx2_pmu = pmu_to_tx2_pmu(event->pmu);
>  	type = tx2_pmu->type;
>  	prorate_factor = tx2_pmu->prorate_factor;
> -
> -	new = reg_readl(hwc->event_base);
> -	prev = local64_xchg(&hwc->prev_count, new);
> -
> -	/* handles rollover of 32 bit counter */
> -	delta = (u32)(((1UL << 32) - prev) + new);
> +	if (type == PMU_TYPE_CCPI2) {
> +		reg_writel(CCPI2_COUNTER_OFFSET + GET_COUNTERID_CCPI2(event),
> +					hwc->event_base + CCPI2_COUNTER_SEL);
> +		new = reg_readl(hwc->event_base + CCPI2_COUNTER_DATA_L);
> +		new |= (u64)reg_readl(hwc->event_base +
> +				CCPI2_COUNTER_DATA_H) << 32;

Can you not access the event register using a 64-bit read?

> +		prev = local64_xchg(&hwc->prev_count, new);
> +		delta = new - prev;
> +	} else {
> +		new = reg_readl(hwc->event_base);
> +		prev = local64_xchg(&hwc->prev_count, new);
> +		/* handles rollover of 32 bit counter */
> +		delta = (u32)(((1UL << 32) - prev) + new);
> +	}
>  
>  	/* DMC event data_transfers granularity is 16 Bytes, convert it to 64 */
>  	if (type == PMU_TYPE_DMC &&
> @@ -351,6 +462,7 @@ static enum tx2_uncore_type get_tx2_pmu_type(struct acpi_device *adev)
>  	} devices[] = {
>  		{"CAV901D", PMU_TYPE_L3C},
>  		{"CAV901F", PMU_TYPE_DMC},
> +		{"CAV901E", PMU_TYPE_CCPI2},
>  		{"", PMU_TYPE_INVALID}
>  	};
>  
> @@ -380,7 +492,8 @@ static bool tx2_uncore_validate_event(struct pmu *pmu,
>   * Make sure the group of events can be scheduled at once
>   * on the PMU.
>   */
> -static bool tx2_uncore_validate_event_group(struct perf_event *event)
> +static bool tx2_uncore_validate_event_group(struct perf_event *event,
> +		int max_counters)
>  {
>  	struct perf_event *sibling, *leader = event->group_leader;
>  	int counters = 0;
> @@ -403,7 +516,7 @@ static bool tx2_uncore_validate_event_group(struct perf_event *event)
>  	 * If the group requires more counters than the HW has,
>  	 * it cannot ever be scheduled.
>  	 */
> -	return counters <= TX2_PMU_MAX_COUNTERS;
> +	return counters <= max_counters;
>  }
>  
>  
> @@ -439,7 +552,7 @@ static int tx2_uncore_event_init(struct perf_event *event)
>  	hwc->config = event->attr.config;
>  
>  	/* Validate the group */
> -	if (!tx2_uncore_validate_event_group(event))
> +	if (!tx2_uncore_validate_event_group(event, tx2_pmu->max_counters))
>  		return -EINVAL;
>  
>  	return 0;
> @@ -457,7 +570,8 @@ static void tx2_uncore_event_start(struct perf_event *event, int flags)
>  	perf_event_update_userpage(event);
>  
>  	/* Start timer for first event */
> -	if (bitmap_weight(tx2_pmu->active_counters,
> +	if (tx2_pmu->type != PMU_TYPE_CCPI2 &&
> +			bitmap_weight(tx2_pmu->active_counters,
>  				tx2_pmu->max_counters) == 1) {
>  		hrtimer_start(&tx2_pmu->hrtimer,
>  			ns_to_ktime(tx2_pmu->hrtimer_interval),
> @@ -495,7 +609,8 @@ static int tx2_uncore_event_add(struct perf_event *event, int flags)
>  	if (hwc->idx < 0)
>  		return -EAGAIN;
>  
> -	tx2_pmu->events[hwc->idx] = event;
> +	if (tx2_pmu->type != PMU_TYPE_CCPI2)
> +		tx2_pmu->events[hwc->idx] = event;
>  	/* set counter control and data registers base address */
>  	tx2_pmu->init_cntr_base(event, tx2_pmu);
>  
> @@ -514,10 +629,14 @@ static void tx2_uncore_event_del(struct perf_event *event, int flags)
>  	tx2_uncore_event_stop(event, PERF_EF_UPDATE);
>  
>  	/* clear the assigned counter */
> -	free_counter(tx2_pmu, GET_COUNTERID(event));
> +	if (tx2_pmu->type == PMU_TYPE_CCPI2)
> +		free_counter(tx2_pmu, GET_COUNTERID_CCPI2(event));
> +	else
> +		free_counter(tx2_pmu, GET_COUNTERID(event));
>  
>  	perf_event_update_userpage(event);
> -	tx2_pmu->events[hwc->idx] = NULL;
> +	if (tx2_pmu->type != PMU_TYPE_CCPI2)
> +		tx2_pmu->events[hwc->idx] = NULL;
>  	hwc->idx = -1;
>  }
>  
> @@ -580,8 +699,12 @@ static int tx2_uncore_pmu_add_dev(struct tx2_uncore_pmu *tx2_pmu)
>  			cpu_online_mask);
>  
>  	tx2_pmu->cpu = cpu;
> -	hrtimer_init(&tx2_pmu->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> -	tx2_pmu->hrtimer.function = tx2_hrtimer_callback;
> +	/* CCPI2 counters are 64 bit counters. */
> +	if (tx2_pmu->type != PMU_TYPE_CCPI2) {
> +		hrtimer_init(&tx2_pmu->hrtimer,
> +				CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> +		tx2_pmu->hrtimer.function = tx2_hrtimer_callback;
> +	}

So I take it the CCPI2 also doesn't have an IRQ, and therefore you can't
hook up sampling events?

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver
  2019-06-14 17:42 ` [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver Ganapatrao Kulkarni
@ 2019-06-27 10:01   ` Will Deacon
  2019-06-27 16:22     ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2019-06-27 10:01 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-doc, linux-kernel, linux-arm-kernel, Will.Deacon,
	mark.rutland, corbet, jnair, Robert.Richter, Jan.Glauber,
	gklkml16

On Fri, Jun 14, 2019 at 05:42:45PM +0000, Ganapatrao Kulkarni wrote:
> From: Ganapatrao Kulkarni <ganapatrao.kulkarni@marvell.com>
> 
> Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU.
> 
> Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
> ---
>  Documentation/perf/thunderx2-pmu.txt | 20 +++++++++++---------
>  1 file changed, 11 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt
> index dffc57143736..62243230abc3 100644
> --- a/Documentation/perf/thunderx2-pmu.txt
> +++ b/Documentation/perf/thunderx2-pmu.txt
> @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
>  =============================================================
>  
>  The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
> -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
> +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and
> +Cavium Coherent Processor Interconnect (CCPI2).
>  
>  The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
>  Events are counted for the default channel (i.e. channel 0) and prorated
>  to the total number of channels/tiles.
>  
> -The DMC and L3C support up to 4 counters. Counters are independently
> -programmable and can be started and stopped individually. Each counter
> -can be set to a different event. Counters are 32-bit and do not support
> -an overflow interrupt; they are read every 2 seconds.
> +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters.

The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8
counters.

> +Counters are independently programmable and can be started and stopped
> +individually. Each counter can be set to a different event. DMC and L3C
> +Counters are 32-bit and do not support an overflow interrupt; they are read

Counters -> counters

> +every 2 seconds. CCPI2 counters are 64-bit.

Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these
two sentences as:

  None of the counters support an overflow interrupt and therefore sampling
  events are unsupported. The DMC and L3C counters are 32-bit and read every
  2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in
  normal operation.

>  PMU UNCORE (perf) driver:
>  
>  The thunderx2_pmu driver registers per-socket perf PMUs for the DMC and
> -L3C devices.  Each PMU can be used to count up to 4 events
> -simultaneously. The PMUs provide a description of their available events
> -and configuration options under sysfs, see
> -/sys/devices/uncore_<l3c_S/dmc_S/>; S is the socket id.
> +L3C devices.  Each PMU can be used to count up to 4(DMC/L3C) or up to 8

Space between 4 and (

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver
  2019-06-27 10:01   ` Will Deacon
@ 2019-06-27 16:22     ` Will Deacon
  2019-06-28  4:15       ` Ganapatrao Kulkarni
  0 siblings, 1 reply; 9+ messages in thread
From: Will Deacon @ 2019-06-27 16:22 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: linux-doc, linux-kernel, linux-arm-kernel, Will.Deacon,
	mark.rutland, corbet, jnair, Robert.Richter, Jan.Glauber,
	gklkml16

On Thu, Jun 27, 2019 at 11:01:18AM +0100, Will Deacon wrote:
> On Fri, Jun 14, 2019 at 05:42:45PM +0000, Ganapatrao Kulkarni wrote:
> > From: Ganapatrao Kulkarni <ganapatrao.kulkarni@marvell.com>
> > 
> > Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU.
> > 
> > Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
> > ---
> >  Documentation/perf/thunderx2-pmu.txt | 20 +++++++++++---------
> >  1 file changed, 11 insertions(+), 9 deletions(-)
> > 
> > diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt
> > index dffc57143736..62243230abc3 100644
> > --- a/Documentation/perf/thunderx2-pmu.txt
> > +++ b/Documentation/perf/thunderx2-pmu.txt
> > @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
> >  =============================================================
> >  
> >  The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
> > -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
> > +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and
> > +Cavium Coherent Processor Interconnect (CCPI2).
> >  
> >  The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
> >  Events are counted for the default channel (i.e. channel 0) and prorated
> >  to the total number of channels/tiles.
> >  
> > -The DMC and L3C support up to 4 counters. Counters are independently
> > -programmable and can be started and stopped individually. Each counter
> > -can be set to a different event. Counters are 32-bit and do not support
> > -an overflow interrupt; they are read every 2 seconds.
> > +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters.
> 
> The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8
> counters.
> 
> > +Counters are independently programmable and can be started and stopped
> > +individually. Each counter can be set to a different event. DMC and L3C
> > +Counters are 32-bit and do not support an overflow interrupt; they are read
> 
> Counters -> counters
> 
> > +every 2 seconds. CCPI2 counters are 64-bit.
> 
> Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these
> two sentences as:
> 
>   None of the counters support an overflow interrupt and therefore sampling
>   events are unsupported. The DMC and L3C counters are 32-bit and read every
>   2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in
>   normal operation.

Mark reminded me that these are system PMUs and therefore sampling is
unsupported irrespective of the presence of an overflow interrupt, so you
can drop that part from the text.

Sorry for the confusion,

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver
  2019-06-27 16:22     ` Will Deacon
@ 2019-06-28  4:15       ` Ganapatrao Kulkarni
  0 siblings, 0 replies; 9+ messages in thread
From: Ganapatrao Kulkarni @ 2019-06-28  4:15 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ganapatrao Kulkarni, linux-doc, linux-kernel, linux-arm-kernel,
	Will.Deacon, mark.rutland, corbet, jnair, jglauber, rrichter

Hi Will,

On Thu, Jun 27, 2019 at 9:52 PM Will Deacon <will@kernel.org> wrote:
>
> On Thu, Jun 27, 2019 at 11:01:18AM +0100, Will Deacon wrote:
> > On Fri, Jun 14, 2019 at 05:42:45PM +0000, Ganapatrao Kulkarni wrote:
> > > From: Ganapatrao Kulkarni <ganapatrao.kulkarni@marvell.com>
> > >
> > > Add documentation for Cavium Coherent Processor Interconnect (CCPI2) PMU.
> > >
> > > Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
> > > ---
> > >  Documentation/perf/thunderx2-pmu.txt | 20 +++++++++++---------
> > >  1 file changed, 11 insertions(+), 9 deletions(-)
> > >
> > > diff --git a/Documentation/perf/thunderx2-pmu.txt b/Documentation/perf/thunderx2-pmu.txt
> > > index dffc57143736..62243230abc3 100644
> > > --- a/Documentation/perf/thunderx2-pmu.txt
> > > +++ b/Documentation/perf/thunderx2-pmu.txt
> > > @@ -2,24 +2,26 @@ Cavium ThunderX2 SoC Performance Monitoring Unit (PMU UNCORE)
> > >  =============================================================
> > >
> > >  The ThunderX2 SoC PMU consists of independent, system-wide, per-socket
> > > -PMUs such as the Level 3 Cache (L3C) and DDR4 Memory Controller (DMC).
> > > +PMUs such as the Level 3 Cache (L3C), DDR4 Memory Controller (DMC) and
> > > +Cavium Coherent Processor Interconnect (CCPI2).
> > >
> > >  The DMC has 8 interleaved channels and the L3C has 16 interleaved tiles.
> > >  Events are counted for the default channel (i.e. channel 0) and prorated
> > >  to the total number of channels/tiles.
> > >
> > > -The DMC and L3C support up to 4 counters. Counters are independently
> > > -programmable and can be started and stopped individually. Each counter
> > > -can be set to a different event. Counters are 32-bit and do not support
> > > -an overflow interrupt; they are read every 2 seconds.
> > > +The DMC, L3C support up to 4 counters and CCPI2 support up to 8 counters.
> >
> > The DMC and L3C support up to 4 counters, while the CCPI2 supports up to 8
> > counters.
> >
> > > +Counters are independently programmable and can be started and stopped
> > > +individually. Each counter can be set to a different event. DMC and L3C
> > > +Counters are 32-bit and do not support an overflow interrupt; they are read
> >
> > Counters -> counters
> >
> > > +every 2 seconds. CCPI2 counters are 64-bit.
> >
> > Assuming CCPI2 also doesn't support an overflow interrupt, I'd reword these
> > two sentences as:
> >
> >   None of the counters support an overflow interrupt and therefore sampling
> >   events are unsupported. The DMC and L3C counters are 32-bit and read every
> >   2 seconds. The CCPI2 counters are 64-bit and assumed not to overflow in
> >   normal operation.
>

Thanks for the comments, will update in v2.
Yes, CCPI2 is 64bit counter and there is no overflow issue.

> Mark reminded me that these are system PMUs and therefore sampling is
> unsupported irrespective of the presence of an overflow interrupt, so you
> can drop that part from the text.

sure.
>
> Sorry for the confusion,
>
> Will

Thanks,
Ganapat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
  2019-06-27  9:57   ` Will Deacon
@ 2019-06-28  5:39     ` Ganapatrao Kulkarni
  2019-06-28  9:08       ` Will Deacon
  0 siblings, 1 reply; 9+ messages in thread
From: Ganapatrao Kulkarni @ 2019-06-28  5:39 UTC (permalink / raw)
  To: Will Deacon
  Cc: Ganapatrao Kulkarni, linux-doc, linux-kernel, linux-arm-kernel,
	Will.Deacon, mark.rutland, corbet, jnair, rrichter, jglauber

Hi will,

On Thu, Jun 27, 2019 at 3:27 PM Will Deacon <will@kernel.org> wrote:
>
> Hi Ganapat,
>
> On Fri, Jun 14, 2019 at 05:42:46PM +0000, Ganapatrao Kulkarni wrote:
> > CCPI2 is a low-latency high-bandwidth serial interface for connecting
> > ThunderX2 processors. This patch adds support to capture CCPI2 perf events.
> >
> > Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
> > ---
> >  drivers/perf/thunderx2_pmu.c | 179 ++++++++++++++++++++++++++++++-----
> >  1 file changed, 157 insertions(+), 22 deletions(-)
> >
> > diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
> > index 43d76c85da56..3791ac9b897d 100644
> > --- a/drivers/perf/thunderx2_pmu.c
> > +++ b/drivers/perf/thunderx2_pmu.c
> > @@ -16,7 +16,9 @@
> >   * they need to be sampled before overflow(i.e, at every 2 seconds).
> >   */
> >
> > -#define TX2_PMU_MAX_COUNTERS         4
> > +#define TX2_PMU_DMC_L3C_MAX_COUNTERS 4
>
> I find it odd that you rename this...

i am not sure, how to avoid this since dmc/l3c have 4 counters and ccpi2 has 8.
i will try to make this better in v2.
>
> > +#define TX2_PMU_CCPI2_MAX_COUNTERS   8
> > +
> >  #define TX2_PMU_DMC_CHANNELS         8
> >  #define TX2_PMU_L3_TILES             16
> >
> > @@ -28,11 +30,22 @@
> >    */
> >  #define DMC_EVENT_CFG(idx, val)              ((val) << (((idx) * 8) + 1))
> >
> > +#define GET_EVENTID_CCPI2(ev)                ((ev->hw.config) & 0x1ff)
> > +/* bits[3:0] to select counters, starts from 8, bit[3] set always. */
> > +#define GET_COUNTERID_CCPI2(ev)              ((ev->hw.idx) & 0x7)
> > +#define CCPI2_COUNTER_OFFSET         8
>
>
> ... but leave GET_EVENTID alone, even though it only applies to DMC/L3C
> events. Saying that, if you have the event you can figure out its type,
> so could you avoid the need for additional macros entirely and just use
> the correct masks based on the corresponding PMU type? That might also
> avoid some of the conditional control flow you're introducing elsewhere.

sure, i will add mask as argument to macro.
>
> >  #define L3C_COUNTER_CTL                      0xA8
> >  #define L3C_COUNTER_DATA             0xAC
> >  #define DMC_COUNTER_CTL                      0x234
> >  #define DMC_COUNTER_DATA             0x240
> >
> > +#define CCPI2_PERF_CTL                       0x108
> > +#define CCPI2_COUNTER_CTL            0x10C
> > +#define CCPI2_COUNTER_SEL            0x12c
> > +#define CCPI2_COUNTER_DATA_L         0x130
> > +#define CCPI2_COUNTER_DATA_H         0x134
> > +
> >  /* L3C event IDs */
> >  #define L3_EVENT_READ_REQ            0xD
> >  #define L3_EVENT_WRITEBACK_REQ               0xE
> > @@ -51,9 +64,16 @@
> >  #define DMC_EVENT_READ_TXNS          0xF
> >  #define DMC_EVENT_MAX                        0x10
> >
> > +#define CCPI2_EVENT_REQ_PKT_SENT     0x3D
> > +#define CCPI2_EVENT_SNOOP_PKT_SENT   0x65
> > +#define CCPI2_EVENT_DATA_PKT_SENT    0x105
> > +#define CCPI2_EVENT_GIC_PKT_SENT     0x12D
> > +#define CCPI2_EVENT_MAX                      0x200
> > +
> >  enum tx2_uncore_type {
> >       PMU_TYPE_L3C,
> >       PMU_TYPE_DMC,
> > +     PMU_TYPE_CCPI2,
> >       PMU_TYPE_INVALID,
> >  };
> >
> > @@ -73,8 +93,8 @@ struct tx2_uncore_pmu {
> >       u32 max_events;
> >       u64 hrtimer_interval;
> >       void __iomem *base;
> > -     DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS);
> > -     struct perf_event *events[TX2_PMU_MAX_COUNTERS];
> > +     DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS);
> > +     struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS];
>
> Hmm, that looks very odd. Why are they now different sizes?

events[ ] is used to hold reference to active events to use in timer
callback, which is not applicable to ccpi2, hence 4.
active_counters is set to max of both. i.e, 8. i will try to make it
better readable in v2.

>
> >       struct device *dev;
> >       struct hrtimer hrtimer;
> >       const struct attribute_group **attr_groups;
> > @@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
> >       return container_of(pmu, struct tx2_uncore_pmu, pmu);
> >  }
> >
> > -PMU_FORMAT_ATTR(event,       "config:0-4");
> > +#define TX2_PMU_FORMAT_ATTR(_var, _name, _format)                    \
> > +static ssize_t                                                               \
> > +__tx2_pmu_##_var##_show(struct device *dev,                          \
> > +                            struct device_attribute *attr,           \
> > +                            char *page)                              \
> > +{                                                                    \
> > +     BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE);                     \
> > +     return sprintf(page, _format "\n");                             \
> > +}                                                                    \
> > +                                                                     \
> > +static struct device_attribute format_attr_##_var =                  \
> > +     __ATTR(_name, 0444, __tx2_pmu_##_var##_show, NULL)
> > +
> > +TX2_PMU_FORMAT_ATTR(event, event, "config:0-4");
> > +TX2_PMU_FORMAT_ATTR(event_ccpi2, event, "config:0-9");
>
> This doesn't look right. Can perf deal with overlapping fields? It seems
> wrong that we'd allow the user to specify both, for example.

I am not sure what is the issue here? both are different PMUs
root@SBR-26> cat /sys/bus/event_source/devices/uncore_dmc_0/format/event
config:0-4
root@SBR-26> cat /sys/bus/event_source/devices/uncore_ccpi2_0/format/event
config:0-9

>
> >
> >  static struct attribute *l3c_pmu_format_attrs[] = {
> >       &format_attr_event.attr,
> > @@ -104,6 +138,11 @@ static struct attribute *dmc_pmu_format_attrs[] = {
> >       NULL,
> >  };
> >
> > +static struct attribute *ccpi2_pmu_format_attrs[] = {
> > +     &format_attr_event_ccpi2.attr,
> > +     NULL,
> > +};
> > +
> >  static const struct attribute_group l3c_pmu_format_attr_group = {
> >       .name = "format",
> >       .attrs = l3c_pmu_format_attrs,
> > @@ -114,6 +153,11 @@ static const struct attribute_group dmc_pmu_format_attr_group = {
> >       .attrs = dmc_pmu_format_attrs,
> >  };
> >
> > +static const struct attribute_group ccpi2_pmu_format_attr_group = {
> > +     .name = "format",
> > +     .attrs = ccpi2_pmu_format_attrs,
> > +};
> > +
> >  /*
> >   * sysfs event attributes
> >   */
> > @@ -164,6 +208,19 @@ static struct attribute *dmc_pmu_events_attrs[] = {
> >       NULL,
> >  };
> >
> > +TX2_EVENT_ATTR(req_pktsent, CCPI2_EVENT_REQ_PKT_SENT);
> > +TX2_EVENT_ATTR(snoop_pktsent, CCPI2_EVENT_SNOOP_PKT_SENT);
> > +TX2_EVENT_ATTR(data_pktsent, CCPI2_EVENT_DATA_PKT_SENT);
> > +TX2_EVENT_ATTR(gic_pktsent, CCPI2_EVENT_GIC_PKT_SENT);
> > +
> > +static struct attribute *ccpi2_pmu_events_attrs[] = {
> > +     &tx2_pmu_event_attr_req_pktsent.attr.attr,
> > +     &tx2_pmu_event_attr_snoop_pktsent.attr.attr,
> > +     &tx2_pmu_event_attr_data_pktsent.attr.attr,
> > +     &tx2_pmu_event_attr_gic_pktsent.attr.attr,
> > +     NULL,
> > +};
> > +
> >  static const struct attribute_group l3c_pmu_events_attr_group = {
> >       .name = "events",
> >       .attrs = l3c_pmu_events_attrs,
> > @@ -174,6 +231,11 @@ static const struct attribute_group dmc_pmu_events_attr_group = {
> >       .attrs = dmc_pmu_events_attrs,
> >  };
> >
> > +static const struct attribute_group ccpi2_pmu_events_attr_group = {
> > +     .name = "events",
> > +     .attrs = ccpi2_pmu_events_attrs,
> > +};
> > +
> >  /*
> >   * sysfs cpumask attributes
> >   */
> > @@ -213,6 +275,13 @@ static const struct attribute_group *dmc_pmu_attr_groups[] = {
> >       NULL
> >  };
> >
> > +static const struct attribute_group *ccpi2_pmu_attr_groups[] = {
> > +     &ccpi2_pmu_format_attr_group,
> > +     &pmu_cpumask_attr_group,
> > +     &ccpi2_pmu_events_attr_group,
> > +     NULL
> > +};
> > +
> >  static inline u32 reg_readl(unsigned long addr)
> >  {
> >       return readl((void __iomem *)addr);
> > @@ -265,6 +334,17 @@ static void init_cntr_base_dmc(struct perf_event *event,
> >               + DMC_COUNTER_DATA + (0xc * GET_COUNTERID(event));
> >  }
> >
> > +static void init_cntr_base_ccpi2(struct perf_event *event,
> > +             struct tx2_uncore_pmu *tx2_pmu)
> > +{
> > +
> > +     struct hw_perf_event *hwc = &event->hw;
> > +
> > +     hwc->config_base = (unsigned long)tx2_pmu->base
> > +             + CCPI2_COUNTER_CTL + (4 * GET_COUNTERID_CCPI2(event));
> > +     hwc->event_base =  (unsigned long)tx2_pmu->base;
> > +}
> > +
> >  static void uncore_start_event_l3c(struct perf_event *event, int flags)
> >  {
> >       u32 val;
> > @@ -312,6 +392,29 @@ static void uncore_stop_event_dmc(struct perf_event *event)
> >       reg_writel(val, hwc->config_base);
> >  }
> >
> > +static void uncore_start_event_ccpi2(struct perf_event *event, int flags)
> > +{
> > +     u32 val;
> > +     struct hw_perf_event *hwc = &event->hw;
> > +
> > +     /* Bit [09:00] to set event id, set level and type to 1 */
> > +     val = reg_readl(hwc->config_base);
> > +     reg_writel((val & ~0xFFF) | (3 << 10) |
> > +                     GET_EVENTID_CCPI2(event), hwc->config_base);
> > +     /* reset[4], enable[0] and start[1] counters */
> > +     reg_writel(0x13, hwc->event_base + CCPI2_PERF_CTL);
> > +     local64_set(&event->hw.prev_count, 0ULL);
> > +}
> > +
> > +static void uncore_stop_event_ccpi2(struct perf_event *event)
> > +{
> > +     struct hw_perf_event *hwc = &event->hw;
> > +
> > +     /* disable and stop counter */
> > +     reg_writel(0, hwc->event_base + CCPI2_PERF_CTL);
>
> How come you need to clear the event register here? You don't do that for
> the DMC/L3C paths.

unfortunately these blocks were designed from different folks and they
did not use same protocol/format.
>
> > +     reg_writel(0, hwc->config_base);
>
> When starting event you're careful to update this using a read-modify-write
> sequence. Why is it safe to zero the whole thing when stopping?

thanks, it is not required.
>
> > +}
> > +
> >  static void tx2_uncore_event_update(struct perf_event *event)
> >  {
> >       s64 prev, delta, new = 0;
> > @@ -323,12 +426,20 @@ static void tx2_uncore_event_update(struct perf_event *event)
> >       tx2_pmu = pmu_to_tx2_pmu(event->pmu);
> >       type = tx2_pmu->type;
> >       prorate_factor = tx2_pmu->prorate_factor;
> > -
> > -     new = reg_readl(hwc->event_base);
> > -     prev = local64_xchg(&hwc->prev_count, new);
> > -
> > -     /* handles rollover of 32 bit counter */
> > -     delta = (u32)(((1UL << 32) - prev) + new);
> > +     if (type == PMU_TYPE_CCPI2) {
> > +             reg_writel(CCPI2_COUNTER_OFFSET + GET_COUNTERID_CCPI2(event),
> > +                                     hwc->event_base + CCPI2_COUNTER_SEL);
> > +             new = reg_readl(hwc->event_base + CCPI2_COUNTER_DATA_L);
> > +             new |= (u64)reg_readl(hwc->event_base +
> > +                             CCPI2_COUNTER_DATA_H) << 32;
>
> Can you not access the event register using a 64-bit read?

thanks, I will update to readq.
>
> > +             prev = local64_xchg(&hwc->prev_count, new);
> > +             delta = new - prev;
> > +     } else {
> > +             new = reg_readl(hwc->event_base);
> > +             prev = local64_xchg(&hwc->prev_count, new);
> > +             /* handles rollover of 32 bit counter */
> > +             delta = (u32)(((1UL << 32) - prev) + new);
> > +     }
> >
> >       /* DMC event data_transfers granularity is 16 Bytes, convert it to 64 */
> >       if (type == PMU_TYPE_DMC &&
> > @@ -351,6 +462,7 @@ static enum tx2_uncore_type get_tx2_pmu_type(struct acpi_device *adev)
> >       } devices[] = {
> >               {"CAV901D", PMU_TYPE_L3C},
> >               {"CAV901F", PMU_TYPE_DMC},
> > +             {"CAV901E", PMU_TYPE_CCPI2},
> >               {"", PMU_TYPE_INVALID}
> >       };
> >
> > @@ -380,7 +492,8 @@ static bool tx2_uncore_validate_event(struct pmu *pmu,
> >   * Make sure the group of events can be scheduled at once
> >   * on the PMU.
> >   */
> > -static bool tx2_uncore_validate_event_group(struct perf_event *event)
> > +static bool tx2_uncore_validate_event_group(struct perf_event *event,
> > +             int max_counters)
> >  {
> >       struct perf_event *sibling, *leader = event->group_leader;
> >       int counters = 0;
> > @@ -403,7 +516,7 @@ static bool tx2_uncore_validate_event_group(struct perf_event *event)
> >        * If the group requires more counters than the HW has,
> >        * it cannot ever be scheduled.
> >        */
> > -     return counters <= TX2_PMU_MAX_COUNTERS;
> > +     return counters <= max_counters;
> >  }
> >
> >
> > @@ -439,7 +552,7 @@ static int tx2_uncore_event_init(struct perf_event *event)
> >       hwc->config = event->attr.config;
> >
> >       /* Validate the group */
> > -     if (!tx2_uncore_validate_event_group(event))
> > +     if (!tx2_uncore_validate_event_group(event, tx2_pmu->max_counters))
> >               return -EINVAL;
> >
> >       return 0;
> > @@ -457,7 +570,8 @@ static void tx2_uncore_event_start(struct perf_event *event, int flags)
> >       perf_event_update_userpage(event);
> >
> >       /* Start timer for first event */
> > -     if (bitmap_weight(tx2_pmu->active_counters,
> > +     if (tx2_pmu->type != PMU_TYPE_CCPI2 &&
> > +                     bitmap_weight(tx2_pmu->active_counters,
> >                               tx2_pmu->max_counters) == 1) {
> >               hrtimer_start(&tx2_pmu->hrtimer,
> >                       ns_to_ktime(tx2_pmu->hrtimer_interval),
> > @@ -495,7 +609,8 @@ static int tx2_uncore_event_add(struct perf_event *event, int flags)
> >       if (hwc->idx < 0)
> >               return -EAGAIN;
> >
> > -     tx2_pmu->events[hwc->idx] = event;
> > +     if (tx2_pmu->type != PMU_TYPE_CCPI2)
> > +             tx2_pmu->events[hwc->idx] = event;
> >       /* set counter control and data registers base address */
> >       tx2_pmu->init_cntr_base(event, tx2_pmu);
> >
> > @@ -514,10 +629,14 @@ static void tx2_uncore_event_del(struct perf_event *event, int flags)
> >       tx2_uncore_event_stop(event, PERF_EF_UPDATE);
> >
> >       /* clear the assigned counter */
> > -     free_counter(tx2_pmu, GET_COUNTERID(event));
> > +     if (tx2_pmu->type == PMU_TYPE_CCPI2)
> > +             free_counter(tx2_pmu, GET_COUNTERID_CCPI2(event));
> > +     else
> > +             free_counter(tx2_pmu, GET_COUNTERID(event));
> >
> >       perf_event_update_userpage(event);
> > -     tx2_pmu->events[hwc->idx] = NULL;
> > +     if (tx2_pmu->type != PMU_TYPE_CCPI2)
> > +             tx2_pmu->events[hwc->idx] = NULL;
> >       hwc->idx = -1;
> >  }
> >
> > @@ -580,8 +699,12 @@ static int tx2_uncore_pmu_add_dev(struct tx2_uncore_pmu *tx2_pmu)
> >                       cpu_online_mask);
> >
> >       tx2_pmu->cpu = cpu;
> > -     hrtimer_init(&tx2_pmu->hrtimer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > -     tx2_pmu->hrtimer.function = tx2_hrtimer_callback;
> > +     /* CCPI2 counters are 64 bit counters. */
> > +     if (tx2_pmu->type != PMU_TYPE_CCPI2) {
> > +             hrtimer_init(&tx2_pmu->hrtimer,
> > +                             CLOCK_MONOTONIC, HRTIMER_MODE_REL);
> > +             tx2_pmu->hrtimer.function = tx2_hrtimer_callback;
> > +     }
>
> So I take it the CCPI2 also doesn't have an IRQ, and therefore you can't
> hook up sampling events?

Yes these too are system wide PMUs, 64 bit counters and do not overflow.

>
> Will

thanks,
Ganapat

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver.
  2019-06-28  5:39     ` Ganapatrao Kulkarni
@ 2019-06-28  9:08       ` Will Deacon
  0 siblings, 0 replies; 9+ messages in thread
From: Will Deacon @ 2019-06-28  9:08 UTC (permalink / raw)
  To: Ganapatrao Kulkarni
  Cc: Ganapatrao Kulkarni, linux-doc, linux-kernel, linux-arm-kernel,
	Will.Deacon, mark.rutland, corbet, jnair, rrichter, jglauber

Hi again, Ganapat,

Thanks for the quick reply.

On Fri, Jun 28, 2019 at 11:09:33AM +0530, Ganapatrao Kulkarni wrote:
> On Thu, Jun 27, 2019 at 3:27 PM Will Deacon <will@kernel.org> wrote:
> > On Fri, Jun 14, 2019 at 05:42:46PM +0000, Ganapatrao Kulkarni wrote:
> > > CCPI2 is a low-latency high-bandwidth serial interface for connecting
> > > ThunderX2 processors. This patch adds support to capture CCPI2 perf events.
> > >
> > > Signed-off-by: Ganapatrao Kulkarni <gkulkarni@marvell.com>
> > > ---
> > >  drivers/perf/thunderx2_pmu.c | 179 ++++++++++++++++++++++++++++++-----
> > >  1 file changed, 157 insertions(+), 22 deletions(-)
> > >
> > > diff --git a/drivers/perf/thunderx2_pmu.c b/drivers/perf/thunderx2_pmu.c
> > > index 43d76c85da56..3791ac9b897d 100644
> > > --- a/drivers/perf/thunderx2_pmu.c
> > > +++ b/drivers/perf/thunderx2_pmu.c
> > > @@ -16,7 +16,9 @@
> > >   * they need to be sampled before overflow(i.e, at every 2 seconds).
> > >   */
> > >
> > > -#define TX2_PMU_MAX_COUNTERS         4
> > > +#define TX2_PMU_DMC_L3C_MAX_COUNTERS 4
> >
> > I find it odd that you rename this...
> 
> i am not sure, how to avoid this since dmc/l3c have 4 counters and ccpi2 has 8.
> i will try to make this better in v2.
> >
> > > +#define TX2_PMU_CCPI2_MAX_COUNTERS   8
> > > +
> > >  #define TX2_PMU_DMC_CHANNELS         8
> > >  #define TX2_PMU_L3_TILES             16
> > >
> > > @@ -28,11 +30,22 @@
> > >    */
> > >  #define DMC_EVENT_CFG(idx, val)              ((val) << (((idx) * 8) + 1))
> > >
> > > +#define GET_EVENTID_CCPI2(ev)                ((ev->hw.config) & 0x1ff)
> > > +/* bits[3:0] to select counters, starts from 8, bit[3] set always. */
> > > +#define GET_COUNTERID_CCPI2(ev)              ((ev->hw.idx) & 0x7)
> > > +#define CCPI2_COUNTER_OFFSET         8
> >
> >
> > ... but leave GET_EVENTID alone, even though it only applies to DMC/L3C
> > events. Saying that, if you have the event you can figure out its type,
> > so could you avoid the need for additional macros entirely and just use
> > the correct masks based on the corresponding PMU type? That might also
> > avoid some of the conditional control flow you're introducing elsewhere.
> 
> sure, i will add mask as argument to macro.
> >
> > >  #define L3C_COUNTER_CTL                      0xA8
> > >  #define L3C_COUNTER_DATA             0xAC
> > >  #define DMC_COUNTER_CTL                      0x234
> > >  #define DMC_COUNTER_DATA             0x240
> > >
> > > +#define CCPI2_PERF_CTL                       0x108
> > > +#define CCPI2_COUNTER_CTL            0x10C
> > > +#define CCPI2_COUNTER_SEL            0x12c
> > > +#define CCPI2_COUNTER_DATA_L         0x130
> > > +#define CCPI2_COUNTER_DATA_H         0x134
> > > +
> > >  /* L3C event IDs */
> > >  #define L3_EVENT_READ_REQ            0xD
> > >  #define L3_EVENT_WRITEBACK_REQ               0xE
> > > @@ -51,9 +64,16 @@
> > >  #define DMC_EVENT_READ_TXNS          0xF
> > >  #define DMC_EVENT_MAX                        0x10
> > >
> > > +#define CCPI2_EVENT_REQ_PKT_SENT     0x3D
> > > +#define CCPI2_EVENT_SNOOP_PKT_SENT   0x65
> > > +#define CCPI2_EVENT_DATA_PKT_SENT    0x105
> > > +#define CCPI2_EVENT_GIC_PKT_SENT     0x12D
> > > +#define CCPI2_EVENT_MAX                      0x200
> > > +
> > >  enum tx2_uncore_type {
> > >       PMU_TYPE_L3C,
> > >       PMU_TYPE_DMC,
> > > +     PMU_TYPE_CCPI2,
> > >       PMU_TYPE_INVALID,
> > >  };
> > >
> > > @@ -73,8 +93,8 @@ struct tx2_uncore_pmu {
> > >       u32 max_events;
> > >       u64 hrtimer_interval;
> > >       void __iomem *base;
> > > -     DECLARE_BITMAP(active_counters, TX2_PMU_MAX_COUNTERS);
> > > -     struct perf_event *events[TX2_PMU_MAX_COUNTERS];
> > > +     DECLARE_BITMAP(active_counters, TX2_PMU_CCPI2_MAX_COUNTERS);
> > > +     struct perf_event *events[TX2_PMU_DMC_L3C_MAX_COUNTERS];
> >
> > Hmm, that looks very odd. Why are they now different sizes?
> 
> events[ ] is used to hold reference to active events to use in timer
> callback, which is not applicable to ccpi2, hence 4.
> active_counters is set to max of both. i.e, 8. i will try to make it
> better readable in v2.

Thanks. I suspect renaming the field would help a lot, or perhaps reworking
your data structures so that you have a union of ccpi2 and dmc/l2c
structures where necessary.

> > >       struct device *dev;
> > >       struct hrtimer hrtimer;
> > >       const struct attribute_group **attr_groups;
> > > @@ -92,7 +112,21 @@ static inline struct tx2_uncore_pmu *pmu_to_tx2_pmu(struct pmu *pmu)
> > >       return container_of(pmu, struct tx2_uncore_pmu, pmu);
> > >  }
> > >
> > > -PMU_FORMAT_ATTR(event,       "config:0-4");
> > > +#define TX2_PMU_FORMAT_ATTR(_var, _name, _format)                    \
> > > +static ssize_t                                                               \
> > > +__tx2_pmu_##_var##_show(struct device *dev,                          \
> > > +                            struct device_attribute *attr,           \
> > > +                            char *page)                              \
> > > +{                                                                    \
> > > +     BUILD_BUG_ON(sizeof(_format) >= PAGE_SIZE);                     \
> > > +     return sprintf(page, _format "\n");                             \
> > > +}                                                                    \
> > > +                                                                     \
> > > +static struct device_attribute format_attr_##_var =                  \
> > > +     __ATTR(_name, 0444, __tx2_pmu_##_var##_show, NULL)
> > > +
> > > +TX2_PMU_FORMAT_ATTR(event, event, "config:0-4");
> > > +TX2_PMU_FORMAT_ATTR(event_ccpi2, event, "config:0-9");
> >
> > This doesn't look right. Can perf deal with overlapping fields? It seems
> > wrong that we'd allow the user to specify both, for example.
> 
> I am not sure what is the issue here? both are different PMUs
> root@SBR-26> cat /sys/bus/event_source/devices/uncore_dmc_0/format/event
> config:0-4
> root@SBR-26> cat /sys/bus/event_source/devices/uncore_ccpi2_0/format/event
> config:0-9

Ah, sorry about that. I got _var and _name the wrong way around and thought
you were introducing a file called event_ccpi2! What you have looks fine.

Will

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2019-06-28  9:08 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-06-14 17:42 [PATCH 0/2] Add CCPI2 PMU support Ganapatrao Kulkarni
2019-06-14 17:42 ` [PATCH 1/2] Documentation: perf: Update documentation for ThunderX2 PMU uncore driver Ganapatrao Kulkarni
2019-06-27 10:01   ` Will Deacon
2019-06-27 16:22     ` Will Deacon
2019-06-28  4:15       ` Ganapatrao Kulkarni
2019-06-14 17:42 ` [PATCH 2/2] drivers/perf: Add CCPI2 PMU support in ThunderX2 UNCORE driver Ganapatrao Kulkarni
2019-06-27  9:57   ` Will Deacon
2019-06-28  5:39     ` Ganapatrao Kulkarni
2019-06-28  9:08       ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).