linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/2] perf: ARM CoreSight PMU support
@ 2022-05-09  0:28 Besar Wicaksono
  2022-05-09  0:28 ` [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
                   ` (3 more replies)
  0 siblings, 4 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-09  0:28 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, suzuki.poulose, treding,
	jonathanh, vsethi, Besar Wicaksono

Add driver support for ARM CoreSight PMU device and event attributes for NVIDIA
implementation. The code is based on ARM Coresight PMU architecture and ACPI ARM
Performance Monitoring Unit table (APMT) specification below:
 * ARM Coresight PMU:
        https://developer.arm.com/documentation/ihi0091/latest
 * APMT: https://developer.arm.com/documentation/den0117/latest

Notes:
 * There is a concern on the naming of the PMU device.
   Currently the driver is probing "arm-coresight-pmu" device, however the APMT
   spec supports different kinds of CoreSight PMU based implementation. So it is
   open for discussion if the name can stay or a "generic" name is required.
   Please see the following thread:
   http://lists.infradead.org/pipermail/linux-arm-kernel/2022-May/740485.html

Besar Wicaksono (2):
  perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute

 arch/arm64/configs/defconfig                  |    1 +
 drivers/perf/Kconfig                          |    2 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/coresight_pmu/Kconfig            |   10 +
 drivers/perf/coresight_pmu/Makefile           |    7 +
 .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317 +++++++++++++++++
 .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
 .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
 .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
 9 files changed, 1802 insertions(+)
 create mode 100644 drivers/perf/coresight_pmu/Kconfig
 create mode 100644 drivers/perf/coresight_pmu/Makefile
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-09  0:28 [PATCH 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
@ 2022-05-09  0:28 ` Besar Wicaksono
  2022-05-09 12:13   ` Robin Murphy
  2022-05-09  0:28 ` [PATCH 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-09  0:28 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, suzuki.poulose, treding,
	jonathanh, vsethi, Besar Wicaksono

Add support for ARM CoreSight PMU driver framework and interfaces.
The driver provides generic implementation to operate uncore PMU based
on ARM CoreSight PMU architecture. The driver also provides interface
to get vendor/implementation specific information, for example event
attributes and formating.

The specification used in this implementation can be found below:
 * ACPI Arm Performance Monitoring Unit table:
        https://developer.arm.com/documentation/den0117/latest
 * ARM Coresight PMU architecture:
        https://developer.arm.com/documentation/ihi0091/latest

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 arch/arm64/configs/defconfig                  |    1 +
 drivers/perf/Kconfig                          |    2 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/coresight_pmu/Kconfig            |   10 +
 drivers/perf/coresight_pmu/Makefile           |    6 +
 .../perf/coresight_pmu/arm_coresight_pmu.c    | 1315 +++++++++++++++++
 .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
 7 files changed, 1482 insertions(+)
 create mode 100644 drivers/perf/coresight_pmu/Kconfig
 create mode 100644 drivers/perf/coresight_pmu/Makefile
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 2ca8b1b336d2..8f2120182b25 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1196,6 +1196,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
 CONFIG_PHY_TEGRA_XUSB=y
 CONFIG_PHY_AM654_SERDES=m
 CONFIG_PHY_J721E_WIZ=m
+CONFIG_ARM_CORESIGHT_PMU=y
 CONFIG_ARM_SMMU_V3_PMU=m
 CONFIG_FSL_IMX8_DDR_PMU=m
 CONFIG_QCOM_L2_PMU=y
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 1e2d69453771..c4e7cd5b4162 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
 	  Enable perf support for Marvell DDR Performance monitoring
 	  event on CN10K platform.
 
+source "drivers/perf/coresight_pmu/Kconfig"
+
 endmenu
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 57a279c61df5..4126a04b5583 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
 obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
+obj-$(CONFIG_ARM_CORESIGHT_PMU) += coresight_pmu/
diff --git a/drivers/perf/coresight_pmu/Kconfig b/drivers/perf/coresight_pmu/Kconfig
new file mode 100644
index 000000000000..487dfee71ad1
--- /dev/null
+++ b/drivers/perf/coresight_pmu/Kconfig
@@ -0,0 +1,10 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+
+config ARM_CORESIGHT_PMU
+	tristate "ARM Coresight PMU"
+	depends on ARM64 && ACPI_APMT
+	help
+	  Provides support for Performance Monitoring Unit (PMU) events based on
+	  ARM CoreSight PMU architecture.
\ No newline at end of file
diff --git a/drivers/perf/coresight_pmu/Makefile b/drivers/perf/coresight_pmu/Makefile
new file mode 100644
index 000000000000..a2a7a5fbbc16
--- /dev/null
+++ b/drivers/perf/coresight_pmu/Makefile
@@ -0,0 +1,6 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+#
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
+	arm_coresight_pmu.o
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
new file mode 100644
index 000000000000..1e9553d29717
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
@@ -0,0 +1,1315 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM CoreSight PMU driver.
+ *
+ * This driver adds support for uncore PMU based on ARM CoreSight Performance
+ * Monitoring Unit Architecture. The PMU is accessible via MMIO registers and
+ * like other uncore PMUs, it does not support process specific events and
+ * cannot be used in sampling mode.
+ *
+ * This code is based on other uncore PMUs like ARM DSU PMU. It provides a
+ * generic implementation to operate the PMU according to CoreSight PMU
+ * architecture and ACPI ARM PMU table (APMT) documents below:
+ *   - ARM CoreSight PMU architecture document number: ARM IHI 0091 A.a-00bet0.
+ *   - APMT document number: ARM DEN0117.
+ * The description of the PMU, like the PMU device identification, available
+ * events, and configuration options, is vendor specific. The driver provides
+ * interface for vendor specific code to get this information. This allows the
+ * driver to be shared with PMU from different vendors.
+ *
+ * CoreSight PMU devices are named as arm_coresight_pmu<node_id> where <node_id>
+ * is APMT node id. The description of the device, like the identifier,
+ * supported events, and formats can be found in sysfs
+ * /sys/bus/event_source/devices/arm_coresight_pmu<node_id>
+ *
+ * The user should refer to the vendor technical documentation to get details
+ * about the supported events.
+ *
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <linux/ctype.h>
+#include <linux/interrupt.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <acpi/processor.h>
+
+#include "arm_coresight_pmu.h"
+
+#define PMUNAME "arm_coresight_pmu"
+
+#define CORESIGHT_CPUMASK_ATTR(_name, _config)				\
+	CORESIGHT_EXT_ATTR(_name, coresight_pmu_cpumask_show,		\
+			   (unsigned long)_config)
+
+/**
+ * Register offsets based on CoreSight Performance Monitoring Unit Architecture
+ * Document number: ARM-ECM-0640169 00alp6
+ */
+#define PMEVCNTR_LO					0x0
+#define PMEVCNTR_HI					0x4
+#define PMEVTYPER					0x400
+#define PMCCFILTR					0x47C
+#define PMEVFILTR					0xA00
+#define PMCNTENSET					0xC00
+#define PMCNTENCLR					0xC20
+#define PMINTENSET					0xC40
+#define PMINTENCLR					0xC60
+#define PMOVSCLR					0xC80
+#define PMOVSSET					0xCC0
+#define PMCFGR						0xE00
+#define PMCR						0xE04
+#define PMIIDR						0xE08
+
+/* PMCFGR register field */
+#define PMCFGR_NCG_SHIFT				28
+#define PMCFGR_NCG_MASK					0xf
+#define PMCFGR_HDBG					BIT(24)
+#define PMCFGR_TRO					BIT(23)
+#define PMCFGR_SS					BIT(22)
+#define PMCFGR_FZO					BIT(21)
+#define PMCFGR_MSI					BIT(20)
+#define PMCFGR_UEN					BIT(19)
+#define PMCFGR_NA					BIT(17)
+#define PMCFGR_EX					BIT(16)
+#define PMCFGR_CCD					BIT(15)
+#define PMCFGR_CC					BIT(14)
+#define PMCFGR_SIZE_SHIFT				8
+#define PMCFGR_SIZE_MASK				0x3f
+#define PMCFGR_N_SHIFT					0
+#define PMCFGR_N_MASK					0xff
+
+/* PMCR register field */
+#define PMCR_TRO					BIT(11)
+#define PMCR_HDBG					BIT(10)
+#define PMCR_FZO					BIT(9)
+#define PMCR_NA						BIT(8)
+#define PMCR_DP						BIT(5)
+#define PMCR_X						BIT(4)
+#define PMCR_D						BIT(3)
+#define PMCR_C						BIT(2)
+#define PMCR_P						BIT(1)
+#define PMCR_E						BIT(0)
+
+/* PMIIDR register field */
+#define PMIIDR_IMPLEMENTER_MASK				0xFFF
+#define PMIIDR_PRODUCTID_MASK				0xFFF
+#define PMIIDR_PRODUCTID_SHIFT				20
+
+/* Each SET/CLR register supports up to 32 counters. */
+#define CORESIGHT_SET_CLR_REG_COUNTER_NUM		32
+#define CORESIGHT_SET_CLR_REG_COUNTER_SHIFT		5
+
+/* The number of 32-bit SET/CLR register that can be supported. */
+#define CORESIGHT_SET_CLR_REG_MAX_NUM ((PMCNTENCLR - PMCNTENSET) / sizeof(u32))
+
+static_assert((CORESIGHT_SET_CLR_REG_MAX_NUM *
+	       CORESIGHT_SET_CLR_REG_COUNTER_NUM) >=
+	      CORESIGHT_PMU_MAX_HW_CNTRS);
+
+/* Convert counter idx into SET/CLR register number. */
+#define CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx)				\
+	(idx >> CORESIGHT_SET_CLR_REG_COUNTER_SHIFT)
+
+/* Convert counter idx into SET/CLR register bit. */
+#define CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx)				\
+	(idx & (CORESIGHT_SET_CLR_REG_COUNTER_NUM - 1))
+
+#define CORESIGHT_ACTIVE_CPU_MASK			0x0
+#define CORESIGHT_ASSOCIATED_CPU_MASK			0x1
+
+#define CORESIGHT_EVENT_MASK				0xFFFFFFFFULL
+#define CORESIGHT_FILTER_MASK				0xFFFFFFFFULL
+#define CORESIGHT_FILTER_SHIFT				32ULL
+
+/* Check if field f in flags is set with value v */
+#define CHECK_APMT_FLAG(flags, f, v) \
+	((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
+
+static unsigned long coresight_pmu_cpuhp_state;
+
+/*
+ * In CoreSight PMU architecture, all of the MMIO registers are 32-bit except
+ * counter register. The counter register can be implemented as 32-bit or 64-bit
+ * register depending on the value of PMCFGR.SIZE field. For 64-bit access,
+ * single-copy 64-bit atomic support is implementation defined. APMT node flag
+ * is used to identify if the PMU supports 64-bit single copy atomic. If 64-bit
+ * single copy atomic is not supported, the driver treats the register as a pair
+ * of 32-bit register.
+ */
+
+/*
+ * Read 32-bit register.
+ *
+ * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
+ * @offset  : register offset.
+ *
+ * @return 32-bit value of the register.
+ */
+static inline u32 read_reg32(void __iomem *base, u32 offset)
+{
+	return readl(base + offset);
+}
+
+/*
+ * Read 64-bit register using single 64-bit atomic copy.
+ *
+ * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
+ * @offset  : register offset.
+ *
+ * @return 64-bit value of the register.
+ */
+static u64 read_reg64(void __iomem *base, u32 offset)
+{
+	return readq(base + offset);
+}
+
+/*
+ * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
+ *
+ * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
+ * @offset  : register offset.
+ *
+ * @return 64-bit value of the register pair.
+ */
+static u64 read_reg64_hilohi(void __iomem *base, u32 offset)
+{
+	u32 val_lo, val_hi;
+	u64 val;
+
+	/* Use high-low-high sequence to avoid tearing */
+	do {
+		val_hi = read_reg32(base, offset + 4);
+		val_lo = read_reg32(base, offset);
+	} while (val_hi != read_reg32(base, offset + 4));
+
+	val = (((u64)val_hi << 32) | val_lo);
+
+	return val;
+}
+
+/*
+ * Write to 32-bit register.
+ *
+ * @val     : 32-bit value to write.
+ * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
+ * @offset  : register offset.
+ *
+ */
+static inline void write_reg32(u32 val, void __iomem *base, u32 offset)
+{
+	writel(val, base + offset);
+}
+
+/*
+ * Write to 64-bit register using single 64-bit atomic copy.
+ *
+ * @val     : 64-bit value to write.
+ * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
+ * @offset  : register offset.
+ *
+ */
+static void write_reg64(u64 val, void __iomem *base, u32 offset)
+{
+	writeq(val, base + offset);
+}
+
+/*
+ * Write to 64-bit register as a pair of 32-bit registers.
+ *
+ * @val     : 64-bit value to write.
+ * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
+ * @offset  : register offset.
+ *
+ */
+static void write_reg64_lohi(u64 val, void __iomem *base, u32 offset)
+{
+	u32 val_lo, val_hi;
+
+	val_hi = upper_32_bits(val);
+	val_lo = lower_32_bits(val);
+
+	write_reg32(val_lo, base, offset);
+	write_reg32(val_hi, base, offset + 4);
+}
+
+/* Check if cycle counter is supported. */
+static inline bool support_cc(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr & PMCFGR_CC);
+}
+
+/* Get counter size. */
+static inline u32 pmcfgr_size(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr >> PMCFGR_SIZE_SHIFT) & PMCFGR_SIZE_MASK;
+}
+
+/* Check if counter is implemented as 64-bit register. */
+static inline bool
+use_64b_counter_reg(const struct coresight_pmu *coresight_pmu)
+{
+	return (pmcfgr_size(coresight_pmu) > 31);
+}
+
+/* Get number of counters, minus one. */
+static inline u32 pmcfgr_n(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr >> PMCFGR_N_SHIFT) & PMCFGR_N_MASK;
+}
+
+ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "event=0x%llx\n",
+			  (unsigned long long)eattr->var);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_event_show);
+
+/**
+ * Event list of PMU that does not support cycle counter. Currently the
+ * CoreSight PMU spec does not define standard events, so it is empty now.
+ */
+static struct attribute *coresight_pmu_event_attrs[] = {
+	NULL,
+};
+
+/* Event list of PMU supporting cycle counter. */
+static struct attribute *coresight_pmu_event_attrs_cc[] = {
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+struct attribute **
+coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	return (support_cc(coresight_pmu)) ? coresight_pmu_event_attrs_cc :
+					     coresight_pmu_event_attrs;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_event_attrs);
+
+ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "%s\n", (char *)eattr->var);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_format_show);
+
+static struct attribute *coresight_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_ATTR(event, "config:0-31"),
+	CORESIGHT_FORMAT_ATTR(filter, "config:32-63"),
+	NULL,
+};
+
+struct attribute **
+coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	return coresight_pmu_format_attrs;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_format_attrs);
+
+u32 coresight_pmu_event_type(const struct perf_event *event)
+{
+	return event->attr.config & CORESIGHT_EVENT_MASK;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_event_type);
+
+u32 coresight_pmu_event_filter(const struct perf_event *event)
+{
+	return (event->attr.config >> CORESIGHT_FILTER_SHIFT) &
+	       CORESIGHT_FILTER_MASK;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_event_filter);
+
+static ssize_t coresight_pmu_identifier_show(struct device *dev,
+					     struct device_attribute *attr,
+					     char *page)
+{
+	struct coresight_pmu *coresight_pmu =
+		to_coresight_pmu(dev_get_drvdata(dev));
+
+	return sysfs_emit(page, "%s\n", coresight_pmu->identifier);
+}
+
+static struct device_attribute coresight_pmu_identifier_attr =
+	__ATTR(identifier, 0444, coresight_pmu_identifier_show, NULL);
+
+static struct attribute *coresight_pmu_identifier_attrs[] = {
+	&coresight_pmu_identifier_attr.attr,
+	NULL,
+};
+
+static struct attribute_group coresight_pmu_identifier_attr_group = {
+	.attrs = coresight_pmu_identifier_attrs,
+};
+
+const char *
+coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu)
+{
+	const char *identifier =
+		devm_kasprintf(coresight_pmu->dev, GFP_KERNEL, "%x",
+			       coresight_pmu->impl.pmiidr);
+	return identifier;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_identifier);
+
+static ssize_t coresight_pmu_cpumask_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case CORESIGHT_ACTIVE_CPU_MASK:
+		cpumask = &coresight_pmu->active_cpu;
+		break;
+	case CORESIGHT_ASSOCIATED_CPU_MASK:
+		cpumask = &coresight_pmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+static struct attribute *coresight_pmu_cpumask_attrs[] = {
+	CORESIGHT_CPUMASK_ATTR(cpumask, CORESIGHT_ACTIVE_CPU_MASK),
+	CORESIGHT_CPUMASK_ATTR(associated_cpus, CORESIGHT_ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static struct attribute_group coresight_pmu_cpumask_attr_group = {
+	.attrs = coresight_pmu_cpumask_attrs,
+};
+
+static const struct coresight_pmu_impl_ops default_impl_ops = {
+	.get_event_attrs	= coresight_pmu_get_event_attrs,
+	.get_format_attrs	= coresight_pmu_get_format_attrs,
+	.get_identifier		= coresight_pmu_get_identifier,
+	.is_cc_event		= coresight_pmu_is_cc_event,
+	.event_type		= coresight_pmu_event_type,
+	.event_filter		= coresight_pmu_event_filter
+};
+
+struct impl_match {
+	u32 jedec_jep106_id;
+	int (*impl_init_ops)(struct coresight_pmu *coresight_pmu);
+};
+
+static const struct impl_match impl_match[] = {
+	{}
+};
+
+static int coresight_pmu_init_impl_ops(struct coresight_pmu *coresight_pmu)
+{
+	int idx, ret;
+	u32 jedec_id;
+	struct acpi_apmt_node *apmt_node = coresight_pmu->apmt_node;
+	const struct impl_match *match = impl_match;
+
+	/*
+	 * Get PMU implementer and product id from APMT node.
+	 * If APMT node doesn't have implementer/product id, try get it
+	 * from PMIIDR.
+	 */
+	coresight_pmu->impl.pmiidr =
+		(apmt_node->impl_id) ? apmt_node->impl_id :
+				       read_reg32(coresight_pmu->base0, PMIIDR);
+
+	jedec_id = coresight_pmu->impl.pmiidr & PMIIDR_IMPLEMENTER_MASK;
+
+	/* Find implementer specific attribute ops. */
+	for (idx = 0; match->jedec_jep106_id; match++, idx++) {
+		if (match->jedec_jep106_id == jedec_id) {
+			ret = match->impl_init_ops(coresight_pmu);
+			if (ret)
+				return ret;
+
+			return 0;
+		}
+	}
+
+	/* We don't find implementer specific attribute ops, use default. */
+	coresight_pmu->impl.ops = &default_impl_ops;
+	return 0;
+}
+
+static struct attribute_group *
+coresight_pmu_alloc_event_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	struct attribute_group *event_group;
+	struct device *dev = coresight_pmu->dev;
+
+	event_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!event_group)
+		return NULL;
+
+	event_group->name = "events";
+	event_group->attrs =
+		coresight_pmu->impl.ops->get_event_attrs(coresight_pmu);
+
+	return event_group;
+}
+
+static struct attribute_group *
+coresight_pmu_alloc_format_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	struct attribute_group *format_group;
+	struct device *dev = coresight_pmu->dev;
+
+	format_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!format_group)
+		return NULL;
+
+	format_group->name = "format";
+	format_group->attrs =
+		coresight_pmu->impl.ops->get_format_attrs(coresight_pmu);
+
+	return format_group;
+}
+
+static struct attribute_group **
+coresight_pmu_alloc_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	const struct coresight_pmu_impl_ops *impl_ops;
+	struct attribute_group **attr_groups = NULL;
+	struct device *dev = coresight_pmu->dev;
+	int ret;
+
+	ret = coresight_pmu_init_impl_ops(coresight_pmu);
+	if (ret)
+		return NULL;
+
+	impl_ops = coresight_pmu->impl.ops;
+
+	coresight_pmu->identifier = impl_ops->get_identifier(coresight_pmu);
+
+	attr_groups = devm_kzalloc(dev, 5 * sizeof(struct attribute_group *),
+				   GFP_KERNEL);
+	if (!attr_groups)
+		return NULL;
+
+	attr_groups[0] = coresight_pmu_alloc_event_attr_group(coresight_pmu);
+	attr_groups[1] = coresight_pmu_alloc_format_attr_group(coresight_pmu);
+	attr_groups[2] = &coresight_pmu_identifier_attr_group;
+	attr_groups[3] = &coresight_pmu_cpumask_attr_group;
+
+	return attr_groups;
+}
+
+static inline void
+coresight_pmu_start_counters(struct coresight_pmu *coresight_pmu)
+{
+	u32 pmcr;
+
+	pmcr = read_reg32(coresight_pmu->base0, PMCR);
+	pmcr |= PMCR_E;
+	write_reg32(pmcr, coresight_pmu->base0, PMCR);
+}
+
+static inline void
+coresight_pmu_stop_counters(struct coresight_pmu *coresight_pmu)
+{
+	u32 pmcr;
+
+	pmcr = read_reg32(coresight_pmu->base0, PMCR);
+	pmcr &= ~PMCR_E;
+	write_reg32(pmcr, coresight_pmu->base0, PMCR);
+}
+
+static void coresight_pmu_enable(struct pmu *pmu)
+{
+	int enabled;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+
+	enabled = bitmap_weight(coresight_pmu->hw_events.used_ctrs,
+				CORESIGHT_PMU_MAX_HW_CNTRS);
+
+	if (!enabled)
+		return;
+
+	coresight_pmu_start_counters(coresight_pmu);
+}
+
+static void coresight_pmu_disable(struct pmu *pmu)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+
+	coresight_pmu_stop_counters(coresight_pmu);
+}
+
+static inline bool is_cycle_cntr_idx(const struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	int idx = event->hw.idx;
+
+	return (support_cc(coresight_pmu) && idx == CORESIGHT_PMU_IDX_CCNTR);
+}
+
+bool coresight_pmu_is_cc_event(const struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	u32 evtype = coresight_pmu->impl.ops->event_type(event);
+
+	return (support_cc(coresight_pmu) &&
+		evtype == CORESIGHT_PMU_EVT_CYCLES_DEFAULT);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_is_cc_event);
+
+static int
+coresight_pmu_get_event_idx(struct coresight_pmu_hw_events *hw_events,
+			    struct perf_event *event)
+{
+	int idx, reserve_cc;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (coresight_pmu->impl.ops->is_cc_event(event)) {
+		/* Search for available cycle counter. */
+		if (test_and_set_bit(CORESIGHT_PMU_IDX_CCNTR,
+				     hw_events->used_ctrs))
+			return -EAGAIN;
+
+		return CORESIGHT_PMU_IDX_CCNTR;
+	}
+
+	/*
+	 * CoreSight PMU can support up to 256 counters. The cycle counter is
+	 * always on counter[31]. To prevent regular event from using cycle
+	 * counter, we reserve the cycle counter bit temporarily.
+	 */
+	reserve_cc = 0;
+	if (support_cc(coresight_pmu) &&
+	    coresight_pmu->num_adj_counters >= CORESIGHT_PMU_IDX_CCNTR)
+		reserve_cc = (test_and_set_bit(CORESIGHT_PMU_IDX_CCNTR,
+					       hw_events->used_ctrs) == 0);
+
+	/* Search available regular counter from the used counter bitmap. */
+	idx = find_first_zero_bit(hw_events->used_ctrs,
+				  coresight_pmu->num_adj_counters);
+
+	/* Restore cycle counter bit. */
+	if (reserve_cc)
+		clear_bit(CORESIGHT_PMU_IDX_CCNTR, hw_events->used_ctrs);
+
+	if (idx >= coresight_pmu->num_adj_counters)
+		return -EAGAIN;
+
+	set_bit(idx, hw_events->used_ctrs);
+
+	return idx;
+}
+
+static bool
+coresight_pmu_validate_event(struct pmu *pmu,
+			     struct coresight_pmu_hw_events *hw_events,
+			     struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return false;
+
+	return (coresight_pmu_get_event_idx(hw_events, event) >= 0);
+}
+
+/**
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool coresight_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct coresight_pmu_hw_events fake_hw_events;
+
+	if (event->group_leader == event)
+		return true;
+
+	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
+
+	if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events, leader))
+		return false;
+
+	for_each_sibling_event(sibling, leader) {
+		if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events,
+						  sibling))
+			return false;
+	}
+
+	return coresight_pmu_validate_event(event->pmu, &fake_hw_events, event);
+}
+
+static int coresight_pmu_event_init(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu;
+	struct hw_perf_event *hwc = &event->hw;
+
+	coresight_pmu = to_coresight_pmu(event->pmu);
+
+	/**
+	 * Following other "uncore" PMUs, we do not support sampling mode or
+	 * attach to a task (per-process mode).
+	 */
+	if (is_sampling_event(event)) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	/**
+	 * Make sure the CPU assignment is on one of the CPUs associated with
+	 * this PMU.
+	 */
+	if (!cpumask_test_cpu(event->cpu, &coresight_pmu->associated_cpus)) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Requested cpu is not associated with the PMU\n");
+		return -EINVAL;
+	}
+
+	/* Enforce the current active CPU to handle the events in this PMU. */
+	event->cpu = cpumask_first(&coresight_pmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (!coresight_pmu_validate_group(event))
+		return -EINVAL;
+
+	/**
+	 * We don't assign an index until we actually place the event onto
+	 * hardware. Use -1 to signify that we haven't decided where to put it
+	 * yet.
+	 */
+	hwc->idx = -1;
+	hwc->config_base = coresight_pmu->impl.ops->event_type(event);
+
+	return 0;
+}
+
+static inline u32 counter_offset(u32 reg_sz, u32 ctr_idx)
+{
+	return (PMEVCNTR_LO + (reg_sz * ctr_idx));
+}
+
+static void coresight_pmu_write_counter(struct perf_event *event, u64 val)
+{
+	u32 offset;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (use_64b_counter_reg(coresight_pmu)) {
+		offset = counter_offset(sizeof(u64), event->hw.idx);
+
+		coresight_pmu->write_reg64(val, coresight_pmu->base1, offset);
+	} else {
+		offset = counter_offset(sizeof(u32), event->hw.idx);
+
+		write_reg32(lower_32_bits(val), coresight_pmu->base1, offset);
+	}
+}
+
+static u64 coresight_pmu_read_counter(struct perf_event *event)
+{
+	u32 offset;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (use_64b_counter_reg(coresight_pmu)) {
+		offset = counter_offset(sizeof(u64), event->hw.idx);
+		return coresight_pmu->read_reg64(coresight_pmu->base1, offset);
+	}
+
+	offset = counter_offset(sizeof(u32), event->hw.idx);
+	return read_reg32(coresight_pmu->base1, offset);
+}
+
+/**
+ * coresight_pmu_set_event_period: Set the period for the counter.
+ *
+ * To handle cases of extreme interrupt latency, we program
+ * the counter with half of the max count for the counters.
+ */
+static void coresight_pmu_set_event_period(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	u64 val = GENMASK_ULL(pmcfgr_size(coresight_pmu), 0) >> 1;
+
+	local64_set(&event->hw.prev_count, val);
+	coresight_pmu_write_counter(event, val);
+}
+
+static void coresight_pmu_enable_counter(struct coresight_pmu *coresight_pmu,
+					 int idx)
+{
+	u32 reg_id, reg_bit, inten_off, cnten_off;
+
+	reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
+	reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
+
+	inten_off = PMINTENSET + (4 * reg_id);
+	cnten_off = PMCNTENSET + (4 * reg_id);
+
+	write_reg32(BIT(reg_bit), coresight_pmu->base0, inten_off);
+	write_reg32(BIT(reg_bit), coresight_pmu->base0, cnten_off);
+}
+
+static void coresight_pmu_disable_counter(struct coresight_pmu *coresight_pmu,
+					  int idx)
+{
+	u32 reg_id, reg_bit, inten_off, cnten_off;
+
+	reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
+	reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
+
+	inten_off = PMINTENCLR + (4 * reg_id);
+	cnten_off = PMCNTENCLR + (4 * reg_id);
+
+	write_reg32(BIT(reg_bit), coresight_pmu->base0, cnten_off);
+	write_reg32(BIT(reg_bit), coresight_pmu->base0, inten_off);
+}
+
+static void coresight_pmu_event_update(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u64 delta, prev, now;
+
+	do {
+		prev = local64_read(&hwc->prev_count);
+		now = coresight_pmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
+
+	delta = (now - prev) & GENMASK_ULL(pmcfgr_size(coresight_pmu), 0);
+	local64_add(delta, &event->count);
+}
+
+static inline void coresight_pmu_set_event(struct coresight_pmu *coresight_pmu,
+					   struct hw_perf_event *hwc)
+{
+	u32 offset = PMEVTYPER + (4 * hwc->idx);
+
+	write_reg32(hwc->config_base, coresight_pmu->base0, offset);
+}
+
+static inline void
+coresight_pmu_set_ev_filter(struct coresight_pmu *coresight_pmu,
+			    struct hw_perf_event *hwc, u32 filter)
+{
+	u32 offset = PMEVFILTR + (4 * hwc->idx);
+
+	write_reg32(filter, coresight_pmu->base0, offset);
+}
+
+static inline void
+coresight_pmu_set_cc_filter(struct coresight_pmu *coresight_pmu, u32 filter)
+{
+	u32 offset = PMCCFILTR;
+
+	write_reg32(filter, coresight_pmu->base0, offset);
+}
+
+static void coresight_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u32 filter;
+
+	/* We always reprogram the counter */
+	if (pmu_flags & PERF_EF_RELOAD)
+		WARN_ON(!(hwc->state & PERF_HES_UPTODATE));
+
+	coresight_pmu_set_event_period(event);
+
+	filter = coresight_pmu->impl.ops->event_filter(event);
+
+	if (is_cycle_cntr_idx(event)) {
+		coresight_pmu_set_cc_filter(coresight_pmu, filter);
+	} else {
+		coresight_pmu_set_event(coresight_pmu, hwc);
+		coresight_pmu_set_ev_filter(coresight_pmu, hwc, filter);
+	}
+
+	hwc->state = 0;
+
+	coresight_pmu_enable_counter(coresight_pmu, hwc->idx);
+}
+
+static void coresight_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->state & PERF_HES_STOPPED)
+		return;
+
+	coresight_pmu_disable_counter(coresight_pmu, hwc->idx);
+	coresight_pmu_event_update(event);
+
+	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static int coresight_pmu_add(struct perf_event *event, int flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct coresight_pmu_hw_events *hw_events = &coresight_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &coresight_pmu->associated_cpus)))
+		return -ENOENT;
+
+	idx = coresight_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hw_events->events[idx] = event;
+	hwc->idx = idx;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		coresight_pmu_start(event, PERF_EF_RELOAD);
+
+	/* Propagate changes to the userspace mapping. */
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static void coresight_pmu_del(struct perf_event *event, int flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct coresight_pmu_hw_events *hw_events = &coresight_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->idx;
+
+	coresight_pmu_stop(event, PERF_EF_UPDATE);
+
+	hw_events->events[idx] = NULL;
+
+	clear_bit(idx, hw_events->used_ctrs);
+
+	perf_event_update_userpage(event);
+}
+
+static void coresight_pmu_read(struct perf_event *event)
+{
+	coresight_pmu_event_update(event);
+}
+
+static int coresight_pmu_alloc(struct platform_device *pdev,
+			       struct coresight_pmu **coresight_pmu)
+{
+	struct acpi_apmt_node *apmt_node;
+	struct device *dev;
+	struct coresight_pmu *pmu;
+
+	dev = &pdev->dev;
+	apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
+	if (!apmt_node) {
+		dev_err(dev, "failed to get APMT node\n");
+		return -ENOMEM;
+	}
+
+	pmu = devm_kzalloc(dev, sizeof(*pmu), GFP_KERNEL);
+	if (!pmu)
+		return -ENOMEM;
+
+	*coresight_pmu = pmu;
+
+	pmu->dev = dev;
+	pmu->apmt_node = apmt_node;
+	pmu->name =
+		devm_kasprintf(dev, GFP_KERNEL, PMUNAME "%u", apmt_node->id);
+
+	platform_set_drvdata(pdev, coresight_pmu);
+
+	return 0;
+}
+
+static int coresight_pmu_init_mmio(struct coresight_pmu *coresight_pmu)
+{
+	struct device *dev;
+	struct platform_device *pdev;
+	struct resource *res;
+	struct acpi_apmt_node *apmt_node;
+
+	dev = coresight_pmu->dev;
+	pdev = to_platform_device(dev);
+	apmt_node = coresight_pmu->apmt_node;
+
+	/* Base address for page 0. */
+	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+	if (!res) {
+		dev_err(dev, "failed to get page-0 resource\n");
+		return -ENOMEM;
+	}
+
+	coresight_pmu->base0 = devm_ioremap_resource(dev, res);
+	if (IS_ERR(coresight_pmu->base0)) {
+		dev_err(dev, "ioremap failed for page-0 resource\n");
+		return PTR_ERR(coresight_pmu->base0);
+	}
+
+	/* Base address for page 1 if supported. Otherwise point it to page 0. */
+	coresight_pmu->base1 = coresight_pmu->base0;
+	if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
+		res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
+		if (!res) {
+			dev_err(dev, "failed to get page-1 resource\n");
+			return -ENOMEM;
+		}
+
+		coresight_pmu->base1 = devm_ioremap_resource(dev, res);
+		if (IS_ERR(coresight_pmu->base1)) {
+			dev_err(dev, "ioremap failed for page-1 resource\n");
+			return PTR_ERR(coresight_pmu->base1);
+		}
+	}
+
+	if (CHECK_APMT_FLAG(apmt_node->flags, ATOMIC, SUPP)) {
+		coresight_pmu->read_reg64 = &read_reg64;
+		coresight_pmu->write_reg64 = &write_reg64;
+	} else {
+		coresight_pmu->read_reg64 = &read_reg64_hilohi;
+		coresight_pmu->write_reg64 = &write_reg64_lohi;
+	}
+
+	coresight_pmu->pmcfgr = read_reg32(coresight_pmu->base0, PMCFGR);
+
+	coresight_pmu->num_adj_counters = pmcfgr_n(coresight_pmu) + 1;
+
+	if (support_cc(coresight_pmu)) {
+		/**
+		 * Exclude the cycle counter if there is a gap between
+		 * cycle counter id and the last regular event counter id.
+		 */
+		if (coresight_pmu->num_adj_counters <= CORESIGHT_PMU_IDX_CCNTR)
+			coresight_pmu->num_adj_counters -= 1;
+	}
+
+	coresight_pmu->num_set_clr_reg =
+		round_up(coresight_pmu->num_adj_counters,
+			 CORESIGHT_SET_CLR_REG_COUNTER_NUM) /
+		CORESIGHT_SET_CLR_REG_COUNTER_NUM;
+
+	return 0;
+}
+
+static inline int
+coresight_pmu_get_reset_overflow(struct coresight_pmu *coresight_pmu,
+				 u32 *pmovs)
+{
+	int i;
+	u32 pmovclr_offset = PMOVSCLR;
+	u32 has_overflowed = 0;
+
+	for (i = 0; i < coresight_pmu->num_set_clr_reg; ++i) {
+		pmovs[i] = read_reg32(coresight_pmu->base1, pmovclr_offset);
+		has_overflowed |= pmovs[i];
+		write_reg32(pmovs[i], coresight_pmu->base1, pmovclr_offset);
+		pmovclr_offset += sizeof(u32);
+	}
+
+	return has_overflowed != 0;
+}
+
+static irqreturn_t coresight_pmu_handle_irq(int irq_num, void *dev)
+{
+	int idx, has_overflowed;
+	struct coresight_pmu *coresight_pmu = dev;
+	u32 pmovs[CORESIGHT_SET_CLR_REG_MAX_NUM] = { 0 };
+	bool handled = false;
+
+	coresight_pmu_stop_counters(coresight_pmu);
+
+	has_overflowed = coresight_pmu_get_reset_overflow(coresight_pmu, pmovs);
+	if (!has_overflowed)
+		goto done;
+
+	for_each_set_bit(idx, (unsigned long *)pmovs,
+			 CORESIGHT_PMU_MAX_HW_CNTRS) {
+		struct perf_event *event = coresight_pmu->hw_events.events[idx];
+
+		if (!event)
+			continue;
+
+		coresight_pmu_event_update(event);
+		coresight_pmu_set_event_period(event);
+
+		handled = true;
+	}
+
+done:
+	coresight_pmu_start_counters(coresight_pmu);
+	return IRQ_RETVAL(handled);
+}
+
+static int coresight_pmu_request_irq(struct coresight_pmu *coresight_pmu)
+{
+	int irq, ret;
+	struct device *dev;
+	struct platform_device *pdev;
+	struct acpi_apmt_node *apmt_node;
+
+	dev = coresight_pmu->dev;
+	pdev = to_platform_device(dev);
+	apmt_node = coresight_pmu->apmt_node;
+
+	/* Skip IRQ request if the PMU does not support overflow interrupt. */
+	if (apmt_node->ovflw_irq == 0)
+		return 0;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0)
+		return irq;
+
+	ret = devm_request_irq(dev, irq, coresight_pmu_handle_irq,
+			       IRQF_NOBALANCING | IRQF_NO_THREAD, dev_name(dev),
+			       coresight_pmu);
+	if (ret) {
+		dev_err(dev, "Could not request IRQ %d\n", irq);
+		return ret;
+	}
+
+	coresight_pmu->irq = irq;
+
+	return 0;
+}
+
+static inline int coresight_pmu_find_cpu_container(int cpu, u32 container_uid)
+{
+	u32 acpi_uid;
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct acpi_device *acpi_dev = ACPI_COMPANION(cpu_dev);
+	int level = 0;
+
+	if (!cpu_dev)
+		return -ENODEV;
+
+	while (acpi_dev) {
+		if (!strcmp(acpi_device_hid(acpi_dev),
+			    ACPI_PROCESSOR_CONTAINER_HID) &&
+		    !kstrtouint(acpi_device_uid(acpi_dev), 0, &acpi_uid) &&
+		    acpi_uid == container_uid)
+			return 0;
+
+		acpi_dev = acpi_dev->parent;
+		level++;
+	}
+
+	return -ENODEV;
+}
+
+static int coresight_pmu_get_cpus(struct coresight_pmu *coresight_pmu)
+{
+	struct acpi_apmt_node *apmt_node;
+	int affinity_flag;
+	int cpu;
+
+	apmt_node = coresight_pmu->apmt_node;
+	affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
+
+	if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
+		for_each_possible_cpu(cpu) {
+			if (apmt_node->proc_affinity ==
+			    get_acpi_id_for_cpu(cpu)) {
+				cpumask_set_cpu(
+					cpu, &coresight_pmu->associated_cpus);
+				break;
+			}
+		}
+	} else {
+		for_each_possible_cpu(cpu) {
+			if (coresight_pmu_find_cpu_container(
+				    cpu, apmt_node->proc_affinity))
+				continue;
+
+			cpumask_set_cpu(cpu, &coresight_pmu->associated_cpus);
+		}
+	}
+
+	return 0;
+}
+
+static int coresight_pmu_register_pmu(struct coresight_pmu *coresight_pmu)
+{
+	int ret;
+	struct attribute_group **attr_groups;
+
+	attr_groups = coresight_pmu_alloc_attr_group(coresight_pmu);
+	if (!attr_groups) {
+		ret = -ENOMEM;
+		return ret;
+	}
+
+	ret = cpuhp_state_add_instance(coresight_pmu_cpuhp_state,
+				       &coresight_pmu->cpuhp_node);
+	if (ret)
+		return ret;
+
+	coresight_pmu->pmu = (struct pmu){
+		.task_ctx_nr	= perf_invalid_context,
+		.module		= THIS_MODULE,
+		.pmu_enable	= coresight_pmu_enable,
+		.pmu_disable	= coresight_pmu_disable,
+		.event_init	= coresight_pmu_event_init,
+		.add		= coresight_pmu_add,
+		.del		= coresight_pmu_del,
+		.start		= coresight_pmu_start,
+		.stop		= coresight_pmu_stop,
+		.read		= coresight_pmu_read,
+		.attr_groups	= (const struct attribute_group **)attr_groups,
+		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
+	};
+
+	ret = perf_pmu_register(&coresight_pmu->pmu, coresight_pmu->name, -1);
+	if (ret) {
+		cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
+					    &coresight_pmu->cpuhp_node);
+	}
+
+	return ret;
+}
+
+static int coresight_pmu_device_probe(struct platform_device *pdev)
+{
+	int ret;
+	struct coresight_pmu *coresight_pmu;
+
+	ret = coresight_pmu_alloc(pdev, &coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_init_mmio(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_request_irq(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_get_cpus(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_register_pmu(coresight_pmu);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int coresight_pmu_device_remove(struct platform_device *pdev)
+{
+	struct coresight_pmu *coresight_pmu = platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&coresight_pmu->pmu);
+	cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
+				    &coresight_pmu->cpuhp_node);
+
+	return 0;
+}
+
+static struct platform_driver coresight_pmu_driver = {
+	.driver = {
+			.name = "arm-coresight-pmu",
+			.suppress_bind_attrs = true,
+		},
+	.probe = coresight_pmu_device_probe,
+	.remove = coresight_pmu_device_remove,
+};
+
+static void coresight_pmu_set_active_cpu(int cpu,
+					 struct coresight_pmu *coresight_pmu)
+{
+	cpumask_set_cpu(cpu, &coresight_pmu->active_cpu);
+	WARN_ON(irq_set_affinity(coresight_pmu->irq,
+				 &coresight_pmu->active_cpu));
+}
+
+static int coresight_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct coresight_pmu *coresight_pmu =
+		hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
+
+	if (!cpumask_test_cpu(cpu, &coresight_pmu->associated_cpus))
+		return 0;
+
+	/* If the PMU is already managed, there is nothing to do */
+	if (!cpumask_empty(&coresight_pmu->active_cpu))
+		return 0;
+
+	/* Use this CPU for event counting */
+	coresight_pmu_set_active_cpu(cpu, coresight_pmu);
+
+	return 0;
+}
+
+static int coresight_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	int dst;
+	struct cpumask online_supported;
+
+	struct coresight_pmu *coresight_pmu =
+		hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
+
+	/* Nothing to do if this CPU doesn't own the PMU */
+	if (!cpumask_test_and_clear_cpu(cpu, &coresight_pmu->active_cpu))
+		return 0;
+
+	/* Choose a new CPU to migrate ownership of the PMU to */
+	cpumask_and(&online_supported, &coresight_pmu->associated_cpus,
+		    cpu_online_mask);
+	dst = cpumask_any_but(&online_supported, cpu);
+	if (dst >= nr_cpu_ids)
+		return 0;
+
+	/* Use this CPU for event counting */
+	perf_pmu_migrate_context(&coresight_pmu->pmu, cpu, dst);
+	coresight_pmu_set_active_cpu(dst, coresight_pmu);
+
+	return 0;
+}
+
+static int __init coresight_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, PMUNAME,
+				      coresight_pmu_cpu_online,
+				      coresight_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+	coresight_pmu_cpuhp_state = ret;
+	return platform_driver_register(&coresight_pmu_driver);
+}
+
+static void __exit coresight_pmu_exit(void)
+{
+	platform_driver_unregister(&coresight_pmu_driver);
+	cpuhp_remove_multi_state(coresight_pmu_cpuhp_state);
+}
+
+module_init(coresight_pmu_init);
+module_exit(coresight_pmu_exit);
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.h b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
new file mode 100644
index 000000000000..59fb40eafe45
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
@@ -0,0 +1,147 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * ARM CoreSight PMU driver.
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+#ifndef __ARM_CORESIGHT_PMU_H__
+#define __ARM_CORESIGHT_PMU_H__
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#define to_coresight_pmu(p) (container_of(p, struct coresight_pmu, pmu))
+
+#define CORESIGHT_EXT_ATTR(_name, _func, _config)			\
+	(&((struct dev_ext_attribute[]){				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define CORESIGHT_FORMAT_ATTR(_name, _config)				\
+	CORESIGHT_EXT_ATTR(_name, coresight_pmu_sysfs_format_show,	\
+			   (char *)_config)
+
+#define CORESIGHT_EVENT_ATTR(_name, _config)				\
+	PMU_EVENT_ATTR_ID(_name, coresight_pmu_sysfs_event_show, _config)
+
+/**
+ * This is the default event number for cycle count, if supported, since the
+ * ARM Coresight PMU specification does not define a standard event code
+ * for cycle count.
+ */
+#define CORESIGHT_PMU_EVT_CYCLES_DEFAULT (0x1ULL << 31)
+
+/**
+ * The ARM Coresight PMU supports up to 256 event counters.
+ * If the counters are larger-than 32-bits, then the PMU includes at
+ * most 128 counters.
+ */
+#define CORESIGHT_PMU_MAX_HW_CNTRS 256
+
+/* The cycle counter, if implemented, is located at counter[31]. */
+#define CORESIGHT_PMU_IDX_CCNTR 31
+
+struct coresight_pmu;
+
+/* This tracks the events assigned to each counter in the PMU. */
+struct coresight_pmu_hw_events {
+	/* The events that are active on the PMU for the given index. */
+	struct perf_event *events[CORESIGHT_PMU_MAX_HW_CNTRS];
+
+	/* Each bit indicates a counter is being used (or not) for an event. */
+	DECLARE_BITMAP(used_ctrs, CORESIGHT_PMU_MAX_HW_CNTRS);
+};
+
+/* Contains ops to query vendor/implementer specific attribute. */
+struct coresight_pmu_impl_ops {
+	/* Get event attributes */
+	struct attribute **(*get_event_attrs)(
+		const struct coresight_pmu *coresight_pmu);
+	/* Get format attributes */
+	struct attribute **(*get_format_attrs)(
+		const struct coresight_pmu *coresight_pmu);
+	/* Get string identifier */
+	const char *(*get_identifier)(const struct coresight_pmu *coresight_pmu);
+	/* Check if the event corresponds to cycle count event */
+	bool (*is_cc_event)(const struct perf_event *event);
+	/* Decode event type/id from configs */
+	u32 (*event_type)(const struct perf_event *event);
+	/* Decode filter value from configs */
+	u32 (*event_filter)(const struct perf_event *event);
+};
+
+/* Vendor/implementer descriptor. */
+struct coresight_pmu_impl {
+	u32 pmiidr;
+	const struct coresight_pmu_impl_ops *ops;
+};
+
+/* Coresight PMU descriptor. */
+struct coresight_pmu {
+	struct pmu pmu;
+	struct device *dev;
+	struct acpi_apmt_node *apmt_node;
+	const char *name;
+	const char *identifier;
+	void __iomem *base0;
+	void __iomem *base1;
+	int irq;
+	cpumask_t associated_cpus;
+	cpumask_t active_cpu;
+	struct hlist_node cpuhp_node;
+
+	u32 pmcfgr;
+	u32 num_adj_counters;
+	u32 num_set_clr_reg;
+
+	struct coresight_pmu_hw_events hw_events;
+
+	void (*write_reg64)(u64 val, void __iomem *base, u32 offset);
+	u64 (*read_reg64)(void __iomem *base, u32 offset);
+
+	struct coresight_pmu_impl impl;
+};
+
+/* Default function to show event attribute in sysfs. */
+ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
+				       struct device_attribute *attr,
+				       char *buf);
+
+/* Default function to show format attribute in sysfs. */
+ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf);
+
+/* Get the default Coresight PMU event attributes. */
+struct attribute **
+coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu);
+
+/* Get the default Coresight PMU format attributes. */
+struct attribute **
+coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu);
+
+/* Get the default Coresight PMU device identifier. */
+const char *
+coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu);
+
+/* Default function to query if an event is a cycle counter event. */
+bool coresight_pmu_is_cc_event(const struct perf_event *event);
+
+/* Default function to query the type/id of an event. */
+u32 coresight_pmu_event_type(const struct perf_event *event);
+
+/* Default function to query the filter value of an event. */
+u32 coresight_pmu_event_filter(const struct perf_event *event);
+
+#endif /* __ARM_CORESIGHT_PMU_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
  2022-05-09  0:28 [PATCH 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2022-05-09  0:28 ` [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
@ 2022-05-09  0:28 ` Besar Wicaksono
  2022-05-09  9:28 ` [PATCH 0/2] perf: ARM CoreSight PMU support Will Deacon
  2022-05-15 16:30 ` [PATCH v2 " Besar Wicaksono
  3 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-09  0:28 UTC (permalink / raw)
  To: catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, suzuki.poulose, treding,
	jonathanh, vsethi, Besar Wicaksono

Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
Fabric (MCF) PMU attributes for CoreSight PMU implementation in
NVIDIA devices.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 drivers/perf/coresight_pmu/Makefile           |   3 +-
 .../perf/coresight_pmu/arm_coresight_pmu.c    |   2 +
 .../coresight_pmu/arm_coresight_pmu_nvidia.c  | 300 ++++++++++++++++++
 .../coresight_pmu/arm_coresight_pmu_nvidia.h  |  17 +
 4 files changed, 321 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h

diff --git a/drivers/perf/coresight_pmu/Makefile b/drivers/perf/coresight_pmu/Makefile
index a2a7a5fbbc16..181b1b0dbaa1 100644
--- a/drivers/perf/coresight_pmu/Makefile
+++ b/drivers/perf/coresight_pmu/Makefile
@@ -3,4 +3,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
-	arm_coresight_pmu.o
+	arm_coresight_pmu.o \
+	arm_coresight_pmu_nvidia.o
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
index 1e9553d29717..e5e50ad344b2 100644
--- a/drivers/perf/coresight_pmu/arm_coresight_pmu.c
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
@@ -39,6 +39,7 @@
 #include <acpi/processor.h>
 
 #include "arm_coresight_pmu.h"
+#include "arm_coresight_pmu_nvidia.h"
 
 #define PMUNAME "arm_coresight_pmu"
 
@@ -411,6 +412,7 @@ struct impl_match {
 };
 
 static const struct impl_match impl_match[] = {
+	{ .jedec_jep106_id = 0x36B, .impl_init_ops = nv_coresight_init_ops },
 	{}
 };
 
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
new file mode 100644
index 000000000000..79de6e0f6a05
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
@@ -0,0 +1,300 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+/* Support for NVIDIA specific attributes. */
+
+#include "arm_coresight_pmu_nvidia.h"
+
+#define NV_EVENT_ID_MASK		0xFFFFFFFFULL
+#define NV_DEFAULT_FILTER_ID_MASK	0xFFFFFFFFULL
+
+#define NV_FILTER_ID_SHIFT		32ULL
+
+#define NV_MCF_PCIE_PORT_COUNT		10ULL
+#define NV_MCF_PCIE_FILTER_ID_MASK	((1ULL << NV_MCF_PCIE_PORT_COUNT) - 1)
+
+#define NV_MCF_GPU_PORT_COUNT		2ULL
+#define NV_MCF_GPU_FILTER_ID_MASK	((1ULL << NV_MCF_GPU_PORT_COUNT) - 1)
+
+#define NV_MCF_NVLINK_PORT_COUNT	4ULL
+#define NV_MCF_NVLINK_FILTER_ID_MASK	((1ULL << NV_MCF_NVLINK_PORT_COUNT) - 1)
+
+#define PMIIDR_PRODUCTID_MASK		0xFFF
+#define PMIIDR_PRODUCTID_SHIFT		20
+
+#define to_nv_pmu_impl(coresight_pmu)	\
+	(container_of(coresight_pmu->impl.ops, struct nv_pmu_impl, ops))
+
+#define CORESIGHT_EVENT_ATTR_4_INNER(_pref, _num, _suff, _config)	\
+	CORESIGHT_EVENT_ATTR(_pref##_num##_suff, _config)
+
+#define CORESIGHT_EVENT_ATTR_4(_pref, _suff, _config)			\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _0_, _suff, _config),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _1_, _suff, _config + 1),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _2_, _suff, _config + 2),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _3_, _suff, _config + 3)
+
+struct nv_pmu_impl {
+	struct coresight_pmu_impl_ops ops;
+	const char *identifier;
+	u32 filter_mask;
+	struct attribute **event_attr;
+	struct attribute **format_attr;
+};
+
+static struct attribute *scf_pmu_event_attrs[] = {
+	CORESIGHT_EVENT_ATTR(bus_cycles,			0x1d),
+
+	CORESIGHT_EVENT_ATTR(scf_cache_allocate,		0xF0),
+	CORESIGHT_EVENT_ATTR(scf_cache_refill,			0xF1),
+	CORESIGHT_EVENT_ATTR(scf_cache,				0xF2),
+	CORESIGHT_EVENT_ATTR(scf_cache_wb,			0xF3),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_data,			0x101),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_rsp,			0x105),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_data,			0x109),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_rsp,			0x10d),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_data,		0x111),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_outstanding,		0x115),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_outstanding,		0x119),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_outstanding,		0x11d),
+	CORESIGHT_EVENT_ATTR_4(socket, wr_outstanding,		0x121),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_outstanding,		0x125),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_outstanding,		0x129),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_access,		0x12d),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_access,		0x131),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_access,		0x135),
+	CORESIGHT_EVENT_ATTR_4(socket, wr_access,		0x139),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_access,		0x13d),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_access,		0x141),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_data,		0x145),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_access,		0x149),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_access,		0x14d),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_outstanding,	0x151),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_outstanding,	0x155),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_data,		0x159),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_access,		0x15d),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_access,		0x161),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_outstanding,		0x165),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_outstanding,		0x169),
+
+	CORESIGHT_EVENT_ATTR(gmem_rd_data,			0x16d),
+	CORESIGHT_EVENT_ATTR(gmem_rd_access,			0x16e),
+	CORESIGHT_EVENT_ATTR(gmem_rd_outstanding,		0x16f),
+	CORESIGHT_EVENT_ATTR(gmem_dl_rsp,			0x170),
+	CORESIGHT_EVENT_ATTR(gmem_dl_access,			0x171),
+	CORESIGHT_EVENT_ATTR(gmem_dl_outstanding,		0x172),
+	CORESIGHT_EVENT_ATTR(gmem_wb_data,			0x173),
+	CORESIGHT_EVENT_ATTR(gmem_wb_access,			0x174),
+	CORESIGHT_EVENT_ATTR(gmem_wb_outstanding,		0x175),
+	CORESIGHT_EVENT_ATTR(gmem_ev_rsp,			0x176),
+	CORESIGHT_EVENT_ATTR(gmem_ev_access,			0x177),
+	CORESIGHT_EVENT_ATTR(gmem_ev_outstanding,		0x178),
+	CORESIGHT_EVENT_ATTR(gmem_wr_data,			0x179),
+	CORESIGHT_EVENT_ATTR(gmem_wr_outstanding,		0x17a),
+	CORESIGHT_EVENT_ATTR(gmem_wr_access,			0x17b),
+
+	CORESIGHT_EVENT_ATTR_4(socket, wr_data,			0x17c),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_data,		0x180),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_data,		0x184),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_access,		0x188),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_outstanding,	0x18c),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_data,		0x190),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_data,		0x194),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_access,		0x198),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_outstanding,		0x19c),
+
+	CORESIGHT_EVENT_ATTR(gmem_wr_total_bytes,		0x1a0),
+	CORESIGHT_EVENT_ATTR(remote_socket_wr_total_bytes,	0x1a1),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_data,		0x1a2),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_outstanding,	0x1a3),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_access,		0x1a4),
+
+	CORESIGHT_EVENT_ATTR(cmem_rd_data,			0x1a5),
+	CORESIGHT_EVENT_ATTR(cmem_rd_access,			0x1a6),
+	CORESIGHT_EVENT_ATTR(cmem_rd_outstanding,		0x1a7),
+	CORESIGHT_EVENT_ATTR(cmem_dl_rsp,			0x1a8),
+	CORESIGHT_EVENT_ATTR(cmem_dl_access,			0x1a9),
+	CORESIGHT_EVENT_ATTR(cmem_dl_outstanding,		0x1aa),
+	CORESIGHT_EVENT_ATTR(cmem_wb_data,			0x1ab),
+	CORESIGHT_EVENT_ATTR(cmem_wb_access,			0x1ac),
+	CORESIGHT_EVENT_ATTR(cmem_wb_outstanding,		0x1ad),
+	CORESIGHT_EVENT_ATTR(cmem_ev_rsp,			0x1ae),
+	CORESIGHT_EVENT_ATTR(cmem_ev_access,			0x1af),
+	CORESIGHT_EVENT_ATTR(cmem_ev_outstanding,		0x1b0),
+	CORESIGHT_EVENT_ATTR(cmem_wr_data,			0x1b1),
+	CORESIGHT_EVENT_ATTR(cmem_wr_outstanding,		0x1b2),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_data,		0x1b3),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_access,		0x1b7),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_access,		0x1bb),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_outstanding,	0x1bf),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_outstanding,	0x1c3),
+
+	CORESIGHT_EVENT_ATTR(ocu_prb_access,			0x1c7),
+	CORESIGHT_EVENT_ATTR(ocu_prb_data,			0x1c8),
+	CORESIGHT_EVENT_ATTR(ocu_prb_outstanding,		0x1c9),
+
+	CORESIGHT_EVENT_ATTR(cmem_wr_access,			0x1ca),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_access,		0x1cb),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_data,		0x1cf),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_data,		0x1d3),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_outstanding,	0x1d7),
+
+	CORESIGHT_EVENT_ATTR(cmem_wr_total_bytes,		0x1db),
+
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute *mcf_pmu_event_attrs[] = {
+	CORESIGHT_EVENT_ATTR(rd_bytes_loc,			0x0),
+	CORESIGHT_EVENT_ATTR(rd_bytes_rem,			0x1),
+	CORESIGHT_EVENT_ATTR(wr_bytes_loc,			0x2),
+	CORESIGHT_EVENT_ATTR(wr_bytes_rem,			0x3),
+	CORESIGHT_EVENT_ATTR(total_bytes_loc,			0x4),
+	CORESIGHT_EVENT_ATTR(total_bytes_rem,			0x5),
+	CORESIGHT_EVENT_ATTR(rd_req_loc,			0x6),
+	CORESIGHT_EVENT_ATTR(rd_req_rem,			0x7),
+	CORESIGHT_EVENT_ATTR(wr_req_loc,			0x8),
+	CORESIGHT_EVENT_ATTR(wr_req_rem,			0x9),
+	CORESIGHT_EVENT_ATTR(total_req_loc,			0xa),
+	CORESIGHT_EVENT_ATTR(total_req_rem,			0xb),
+	CORESIGHT_EVENT_ATTR(rd_cum_outs_loc,			0xc),
+	CORESIGHT_EVENT_ATTR(rd_cum_outs_rem,			0xd),
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+static struct attribute *scf_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_ATTR(event, "config:0-31"),
+	NULL,
+};
+
+static struct attribute *mcf_pcie_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_ATTR(event, "config:0-31"),
+	CORESIGHT_FORMAT_ATTR(root_port, "config:32-41"),
+	NULL,
+};
+
+static struct attribute *mcf_gpu_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_ATTR(event, "config:0-31"),
+	CORESIGHT_FORMAT_ATTR(gpu, "config:32-33"),
+	NULL,
+};
+
+static struct attribute *mcf_nvlink_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_ATTR(event, "config:0-31"),
+	CORESIGHT_FORMAT_ATTR(socket, "config:32-35"),
+	NULL,
+};
+
+static struct attribute **
+nv_coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->event_attr;
+}
+
+static struct attribute **
+nv_coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->format_attr;
+}
+
+static const char *
+nv_coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->identifier;
+}
+
+static u32 nv_coresight_pmu_event_type(const struct perf_event *event)
+{
+	return event->attr.config & NV_EVENT_ID_MASK;
+}
+
+static u32 nv_coresight_pmu_event_filter(const struct perf_event *event)
+{
+	const struct nv_pmu_impl *impl =
+		to_nv_pmu_impl(to_coresight_pmu(event->pmu));
+	return (event->attr.config >> NV_FILTER_ID_SHIFT) & impl->filter_mask;
+}
+
+int nv_coresight_init_ops(struct coresight_pmu *coresight_pmu)
+{
+	u32 product_id;
+	struct nv_pmu_impl *impl;
+
+	impl = devm_kzalloc(coresight_pmu->dev, sizeof(struct nv_pmu_impl),
+			   GFP_KERNEL);
+	if (!impl)
+		return -ENOMEM;
+
+	product_id = (coresight_pmu->impl.pmiidr >> PMIIDR_PRODUCTID_SHIFT) &
+		     PMIIDR_PRODUCTID_MASK;
+
+	switch (product_id) {
+	case 0x103:
+		impl->identifier	= "nvidia_mcf_pcie";
+		impl->filter_mask	= NV_MCF_PCIE_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_pcie_pmu_format_attrs;
+		break;
+	case 0x104:
+		impl->identifier	= "nvidia_mcf_gpuvir";
+		impl->filter_mask	= NV_MCF_GPU_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_gpu_pmu_format_attrs;
+		break;
+	case 0x105:
+		impl->identifier	= "nvidia_mcf_gpu";
+		impl->filter_mask	= NV_MCF_GPU_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_gpu_pmu_format_attrs;
+		break;
+	case 0x106:
+		impl->identifier	= "nvidia_mcf_nvlink";
+		impl->filter_mask	= NV_MCF_NVLINK_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_nvlink_pmu_format_attrs;
+		break;
+	case 0x2CF:
+		impl->identifier	= "nvidia_scf";
+		impl->filter_mask	= 0x0;
+		impl->event_attr	= scf_pmu_event_attrs;
+		impl->format_attr	= scf_pmu_format_attrs;
+		break;
+	default:
+		impl->identifier  = coresight_pmu_get_identifier(coresight_pmu);
+		impl->filter_mask = NV_DEFAULT_FILTER_ID_MASK;
+		impl->event_attr  = coresight_pmu_get_event_attrs(coresight_pmu);
+		impl->format_attr =
+			coresight_pmu_get_format_attrs(coresight_pmu);
+		break;
+	}
+
+	impl->ops.get_event_attrs	= nv_coresight_pmu_get_event_attrs;
+	impl->ops.get_format_attrs	= nv_coresight_pmu_get_format_attrs;
+	impl->ops.get_identifier	= nv_coresight_pmu_get_identifier;
+	impl->ops.is_cc_event		= coresight_pmu_is_cc_event;
+	impl->ops.event_type		= nv_coresight_pmu_event_type;
+	impl->ops.event_filter		= nv_coresight_pmu_event_filter;
+
+	coresight_pmu->impl.ops = &impl->ops;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nv_coresight_init_ops);
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h
new file mode 100644
index 000000000000..3c81c16c14f4
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+/* Support for NVIDIA specific attributes. */
+
+#ifndef __ARM_CORESIGHT_PMU_NVIDIA_H__
+#define __ARM_CORESIGHT_PMU_NVIDIA_H__
+
+#include "arm_coresight_pmu.h"
+
+/* Allocate NVIDIA descriptor. */
+int nv_coresight_init_ops(struct coresight_pmu *coresight_pmu);
+
+#endif /* __ARM_CORESIGHT_PMU_NVIDIA_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-09  0:28 [PATCH 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2022-05-09  0:28 ` [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
  2022-05-09  0:28 ` [PATCH 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
@ 2022-05-09  9:28 ` Will Deacon
  2022-05-09 10:02   ` Suzuki K Poulose
  2022-05-15 16:30 ` [PATCH v2 " Besar Wicaksono
  3 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2022-05-09  9:28 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: catalin.marinas, mark.rutland, linux-arm-kernel, linux-kernel,
	linux-tegra, sudeep.holla, thanu.rangarajan, Michael.Williams,
	suzuki.poulose, treding, jonathanh, vsethi

On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
> Add driver support for ARM CoreSight PMU device and event attributes for NVIDIA
> implementation. The code is based on ARM Coresight PMU architecture and ACPI ARM
> Performance Monitoring Unit table (APMT) specification below:
>  * ARM Coresight PMU:
>         https://developer.arm.com/documentation/ihi0091/latest
>  * APMT: https://developer.arm.com/documentation/den0117/latest
> 
> Notes:
>  * There is a concern on the naming of the PMU device.
>    Currently the driver is probing "arm-coresight-pmu" device, however the APMT
>    spec supports different kinds of CoreSight PMU based implementation. So it is
>    open for discussion if the name can stay or a "generic" name is required.
>    Please see the following thread:
>    http://lists.infradead.org/pipermail/linux-arm-kernel/2022-May/740485.html
> 
> Besar Wicaksono (2):
>   perf: coresight_pmu: Add support for ARM CoreSight PMU driver
>   perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
> 
>  arch/arm64/configs/defconfig                  |    1 +
>  drivers/perf/Kconfig                          |    2 +
>  drivers/perf/Makefile                         |    1 +
>  drivers/perf/coresight_pmu/Kconfig            |   10 +
>  drivers/perf/coresight_pmu/Makefile           |    7 +
>  .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317 +++++++++++++++++
>  .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
>  .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
>  .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
>  9 files changed, 1802 insertions(+)

How does this interact with all the stuff we have under
drivers/hwtracing/coresight/?

Will

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-09  9:28 ` [PATCH 0/2] perf: ARM CoreSight PMU support Will Deacon
@ 2022-05-09 10:02   ` Suzuki K Poulose
  2022-05-09 12:20     ` Shaokun Zhang
                       ` (2 more replies)
  0 siblings, 3 replies; 31+ messages in thread
From: Suzuki K Poulose @ 2022-05-09 10:02 UTC (permalink / raw)
  To: Will Deacon, Besar Wicaksono
  Cc: catalin.marinas, mark.rutland, linux-arm-kernel, linux-kernel,
	linux-tegra, sudeep.holla, thanu.rangarajan, Michael.Williams,
	treding, jonathanh, vsethi, Mathieu Poirier,
	Michael Williams (ATG)

Cc: Mike Williams, Mathieu Poirier

On 09/05/2022 10:28, Will Deacon wrote:
> On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
>> Add driver support for ARM CoreSight PMU device and event attributes for NVIDIA
>> implementation. The code is based on ARM Coresight PMU architecture and ACPI ARM
>> Performance Monitoring Unit table (APMT) specification below:
>>   * ARM Coresight PMU:
>>          https://developer.arm.com/documentation/ihi0091/latest
>>   * APMT: https://developer.arm.com/documentation/den0117/latest
>>
>> Notes:
>>   * There is a concern on the naming of the PMU device.
>>     Currently the driver is probing "arm-coresight-pmu" device, however the APMT
>>     spec supports different kinds of CoreSight PMU based implementation. So it is
>>     open for discussion if the name can stay or a "generic" name is required.
>>     Please see the following thread:
>>     http://lists.infradead.org/pipermail/linux-arm-kernel/2022-May/740485.html
>>
>> Besar Wicaksono (2):
>>    perf: coresight_pmu: Add support for ARM CoreSight PMU driver
>>    perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
>>
>>   arch/arm64/configs/defconfig                  |    1 +
>>   drivers/perf/Kconfig                          |    2 +
>>   drivers/perf/Makefile                         |    1 +
>>   drivers/perf/coresight_pmu/Kconfig            |   10 +
>>   drivers/perf/coresight_pmu/Makefile           |    7 +
>>   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317 +++++++++++++++++
>>   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
>>   .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
>>   .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
>>   9 files changed, 1802 insertions(+)
> 
> How does this interact with all the stuff we have under
> drivers/hwtracing/coresight/?

Absolutely zero, except for the name. The standard
is named "CoreSight PMU" which is a bit unfortunate,
given the only link, AFAIU, with the "CoreSight" architecture
is the Lock Access Register(LAR). For reference, the
drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
tracing and the PMU is called "cs_etm" (expands to coresight etm).
Otherwise the standard doesn't have anything to do with what
exists already in the kernel.

That said, I am concerned that the "coresight_pmu" is easily confused
with what exists today. Given that this is more of a "PMU" standard
for the IPs in the Arm world, it would be better to name it as such
avoiding any confusion with the existing PMUs.

One potential recommendation for the name is, "Arm PMU"  (The ACPI table 
is named Arm PMU Table). But then that could be clashing with the 
armv8_pmu :-(.

Some of the other options are :

"Arm Generic PMU"
"Arm Uncore PMU"
"Arm PMU"

Suzuki

> 
> Will


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-09  0:28 ` [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
@ 2022-05-09 12:13   ` Robin Murphy
  2022-05-11  2:46     ` Besar Wicaksono
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Murphy @ 2022-05-09 12:13 UTC (permalink / raw)
  To: Besar Wicaksono, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, suzuki.poulose, treding,
	jonathanh, vsethi

On 2022-05-09 01:28, Besar Wicaksono wrote:
> Add support for ARM CoreSight PMU driver framework and interfaces.
> The driver provides generic implementation to operate uncore PMU based
> on ARM CoreSight PMU architecture. The driver also provides interface
> to get vendor/implementation specific information, for example event
> attributes and formating.
> 
> The specification used in this implementation can be found below:
>   * ACPI Arm Performance Monitoring Unit table:
>          https://developer.arm.com/documentation/den0117/latest
>   * ARM Coresight PMU architecture:
>          https://developer.arm.com/documentation/ihi0091/latest
> 
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
>   arch/arm64/configs/defconfig                  |    1 +
>   drivers/perf/Kconfig                          |    2 +
>   drivers/perf/Makefile                         |    1 +
>   drivers/perf/coresight_pmu/Kconfig            |   10 +
>   drivers/perf/coresight_pmu/Makefile           |    6 +
>   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1315 +++++++++++++++++
>   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
>   7 files changed, 1482 insertions(+)
>   create mode 100644 drivers/perf/coresight_pmu/Kconfig
>   create mode 100644 drivers/perf/coresight_pmu/Makefile
>   create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
>   create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 2ca8b1b336d2..8f2120182b25 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -1196,6 +1196,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
>   CONFIG_PHY_TEGRA_XUSB=y
>   CONFIG_PHY_AM654_SERDES=m
>   CONFIG_PHY_J721E_WIZ=m
> +CONFIG_ARM_CORESIGHT_PMU=y
>   CONFIG_ARM_SMMU_V3_PMU=m
>   CONFIG_FSL_IMX8_DDR_PMU=m
>   CONFIG_QCOM_L2_PMU=y
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 1e2d69453771..c4e7cd5b4162 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
>   	  Enable perf support for Marvell DDR Performance monitoring
>   	  event on CN10K platform.
>   
> +source "drivers/perf/coresight_pmu/Kconfig"
> +
>   endmenu
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index 57a279c61df5..4126a04b5583 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
>   obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
>   obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
>   obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
> +obj-$(CONFIG_ARM_CORESIGHT_PMU) += coresight_pmu/
> diff --git a/drivers/perf/coresight_pmu/Kconfig b/drivers/perf/coresight_pmu/Kconfig
> new file mode 100644
> index 000000000000..487dfee71ad1
> --- /dev/null
> +++ b/drivers/perf/coresight_pmu/Kconfig
> @@ -0,0 +1,10 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> +
> +config ARM_CORESIGHT_PMU
> +	tristate "ARM Coresight PMU"
> +	depends on ARM64 && ACPI_APMT

There shouldn't be any functional dependency on any CPU architecture here.

> +	help
> +	  Provides support for Performance Monitoring Unit (PMU) events based on
> +	  ARM CoreSight PMU architecture.
> \ No newline at end of file
> diff --git a/drivers/perf/coresight_pmu/Makefile b/drivers/perf/coresight_pmu/Makefile
> new file mode 100644
> index 000000000000..a2a7a5fbbc16
> --- /dev/null
> +++ b/drivers/perf/coresight_pmu/Makefile
> @@ -0,0 +1,6 @@
> +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> +#
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
> +	arm_coresight_pmu.o
> diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> new file mode 100644
> index 000000000000..1e9553d29717
> --- /dev/null
> +++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> @@ -0,0 +1,1315 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ARM CoreSight PMU driver.
> + *
> + * This driver adds support for uncore PMU based on ARM CoreSight Performance
> + * Monitoring Unit Architecture. The PMU is accessible via MMIO registers and
> + * like other uncore PMUs, it does not support process specific events and
> + * cannot be used in sampling mode.
> + *
> + * This code is based on other uncore PMUs like ARM DSU PMU. It provides a
> + * generic implementation to operate the PMU according to CoreSight PMU
> + * architecture and ACPI ARM PMU table (APMT) documents below:
> + *   - ARM CoreSight PMU architecture document number: ARM IHI 0091 A.a-00bet0.
> + *   - APMT document number: ARM DEN0117.
> + * The description of the PMU, like the PMU device identification, available
> + * events, and configuration options, is vendor specific. The driver provides
> + * interface for vendor specific code to get this information. This allows the
> + * driver to be shared with PMU from different vendors.
> + *
> + * CoreSight PMU devices are named as arm_coresight_pmu<node_id> where <node_id>
> + * is APMT node id. The description of the device, like the identifier,
> + * supported events, and formats can be found in sysfs
> + * /sys/bus/event_source/devices/arm_coresight_pmu<node_id>
> + *
> + * The user should refer to the vendor technical documentation to get details
> + * about the supported events.
> + *
> + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> + *
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/ctype.h>
> +#include <linux/interrupt.h>
> +#include <linux/module.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +#include <acpi/processor.h>
> +
> +#include "arm_coresight_pmu.h"
> +
> +#define PMUNAME "arm_coresight_pmu"
> +
> +#define CORESIGHT_CPUMASK_ATTR(_name, _config)				\
> +	CORESIGHT_EXT_ATTR(_name, coresight_pmu_cpumask_show,		\
> +			   (unsigned long)_config)
> +
> +/**
> + * Register offsets based on CoreSight Performance Monitoring Unit Architecture
> + * Document number: ARM-ECM-0640169 00alp6
> + */
> +#define PMEVCNTR_LO					0x0
> +#define PMEVCNTR_HI					0x4
> +#define PMEVTYPER					0x400
> +#define PMCCFILTR					0x47C
> +#define PMEVFILTR					0xA00
> +#define PMCNTENSET					0xC00
> +#define PMCNTENCLR					0xC20
> +#define PMINTENSET					0xC40
> +#define PMINTENCLR					0xC60
> +#define PMOVSCLR					0xC80
> +#define PMOVSSET					0xCC0
> +#define PMCFGR						0xE00
> +#define PMCR						0xE04
> +#define PMIIDR						0xE08
> +
> +/* PMCFGR register field */
> +#define PMCFGR_NCG_SHIFT				28
> +#define PMCFGR_NCG_MASK					0xf
> +#define PMCFGR_HDBG					BIT(24)
> +#define PMCFGR_TRO					BIT(23)
> +#define PMCFGR_SS					BIT(22)
> +#define PMCFGR_FZO					BIT(21)
> +#define PMCFGR_MSI					BIT(20)
> +#define PMCFGR_UEN					BIT(19)
> +#define PMCFGR_NA					BIT(17)
> +#define PMCFGR_EX					BIT(16)
> +#define PMCFGR_CCD					BIT(15)
> +#define PMCFGR_CC					BIT(14)
> +#define PMCFGR_SIZE_SHIFT				8
> +#define PMCFGR_SIZE_MASK				0x3f
> +#define PMCFGR_N_SHIFT					0
> +#define PMCFGR_N_MASK					0xff
> +
> +/* PMCR register field */
> +#define PMCR_TRO					BIT(11)
> +#define PMCR_HDBG					BIT(10)
> +#define PMCR_FZO					BIT(9)
> +#define PMCR_NA						BIT(8)
> +#define PMCR_DP						BIT(5)
> +#define PMCR_X						BIT(4)
> +#define PMCR_D						BIT(3)
> +#define PMCR_C						BIT(2)
> +#define PMCR_P						BIT(1)
> +#define PMCR_E						BIT(0)
> +
> +/* PMIIDR register field */
> +#define PMIIDR_IMPLEMENTER_MASK				0xFFF
> +#define PMIIDR_PRODUCTID_MASK				0xFFF
> +#define PMIIDR_PRODUCTID_SHIFT				20
> +
> +/* Each SET/CLR register supports up to 32 counters. */
> +#define CORESIGHT_SET_CLR_REG_COUNTER_NUM		32
> +#define CORESIGHT_SET_CLR_REG_COUNTER_SHIFT		5
> +
> +/* The number of 32-bit SET/CLR register that can be supported. */
> +#define CORESIGHT_SET_CLR_REG_MAX_NUM ((PMCNTENCLR - PMCNTENSET) / sizeof(u32))
> +
> +static_assert((CORESIGHT_SET_CLR_REG_MAX_NUM *
> +	       CORESIGHT_SET_CLR_REG_COUNTER_NUM) >=
> +	      CORESIGHT_PMU_MAX_HW_CNTRS);
> +
> +/* Convert counter idx into SET/CLR register number. */
> +#define CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx)				\
> +	(idx >> CORESIGHT_SET_CLR_REG_COUNTER_SHIFT)
> +
> +/* Convert counter idx into SET/CLR register bit. */
> +#define CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx)				\
> +	(idx & (CORESIGHT_SET_CLR_REG_COUNTER_NUM - 1))
> +
> +#define CORESIGHT_ACTIVE_CPU_MASK			0x0
> +#define CORESIGHT_ASSOCIATED_CPU_MASK			0x1
> +
> +#define CORESIGHT_EVENT_MASK				0xFFFFFFFFULL
> +#define CORESIGHT_FILTER_MASK				0xFFFFFFFFULL
> +#define CORESIGHT_FILTER_SHIFT				32ULL
> +
> +/* Check if field f in flags is set with value v */
> +#define CHECK_APMT_FLAG(flags, f, v) \
> +	((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
> +
> +static unsigned long coresight_pmu_cpuhp_state;
> +
> +/*
> + * In CoreSight PMU architecture, all of the MMIO registers are 32-bit except
> + * counter register. The counter register can be implemented as 32-bit or 64-bit
> + * register depending on the value of PMCFGR.SIZE field. For 64-bit access,
> + * single-copy 64-bit atomic support is implementation defined. APMT node flag
> + * is used to identify if the PMU supports 64-bit single copy atomic. If 64-bit
> + * single copy atomic is not supported, the driver treats the register as a pair
> + * of 32-bit register.
> + */
> +
> +/*
> + * Read 32-bit register.
> + *
> + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> + * @offset  : register offset.
> + *
> + * @return 32-bit value of the register.
> + */
> +static inline u32 read_reg32(void __iomem *base, u32 offset)
> +{
> +	return readl(base + offset);
> +}

read_reg32(x, y);
readl(x + y);

These kind of wrappers are just about reasonable when they encapsulate a 
structure dereference or some computation to transform the offset, but 
having 13 extra lines plus 4 extra characters per callsite purely to 
obfuscate an addition seems objectively worse than not doing that.

> +
> +/*
> + * Read 64-bit register using single 64-bit atomic copy.
> + *
> + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> + * @offset  : register offset.
> + *
> + * @return 64-bit value of the register.
> + */
> +static u64 read_reg64(void __iomem *base, u32 offset)
> +{
> +	return readq(base + offset);
> +}
> +
> +/*
> + * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
> + *
> + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> + * @offset  : register offset.
> + *
> + * @return 64-bit value of the register pair.
> + */
> +static u64 read_reg64_hilohi(void __iomem *base, u32 offset)
> +{
> +	u32 val_lo, val_hi;
> +	u64 val;
> +
> +	/* Use high-low-high sequence to avoid tearing */
> +	do {
> +		val_hi = read_reg32(base, offset + 4);
> +		val_lo = read_reg32(base, offset);
> +	} while (val_hi != read_reg32(base, offset + 4));
> +
> +	val = (((u64)val_hi << 32) | val_lo);
> +
> +	return val;
> +}
> +
> +/*
> + * Write to 32-bit register.
> + *
> + * @val     : 32-bit value to write.
> + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> + * @offset  : register offset.
> + *
> + */
> +static inline void write_reg32(u32 val, void __iomem *base, u32 offset)
> +{
> +	writel(val, base + offset);
> +}
> +
> +/*
> + * Write to 64-bit register using single 64-bit atomic copy.
> + *
> + * @val     : 64-bit value to write.
> + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> + * @offset  : register offset.
> + *
> + */
> +static void write_reg64(u64 val, void __iomem *base, u32 offset)
> +{
> +	writeq(val, base + offset);
> +}
> +
> +/*
> + * Write to 64-bit register as a pair of 32-bit registers.
> + *
> + * @val     : 64-bit value to write.
> + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> + * @offset  : register offset.
> + *
> + */
> +static void write_reg64_lohi(u64 val, void __iomem *base, u32 offset)
> +{
> +	u32 val_lo, val_hi;
> +
> +	val_hi = upper_32_bits(val);
> +	val_lo = lower_32_bits(val);
> +
> +	write_reg32(val_lo, base, offset);
> +	write_reg32(val_hi, base, offset + 4);
> +}

#include <linux/io-64-nonatomic-lo-hi.h>

> +
> +/* Check if cycle counter is supported. */
> +static inline bool support_cc(const struct coresight_pmu *coresight_pmu)
> +{
> +	return (coresight_pmu->pmcfgr & PMCFGR_CC);
> +}
> +
> +/* Get counter size. */
> +static inline u32 pmcfgr_size(const struct coresight_pmu *coresight_pmu)
> +{
> +	return (coresight_pmu->pmcfgr >> PMCFGR_SIZE_SHIFT) & PMCFGR_SIZE_MASK;
> +}
> +
> +/* Check if counter is implemented as 64-bit register. */
> +static inline bool
> +use_64b_counter_reg(const struct coresight_pmu *coresight_pmu)
> +{
> +	return (pmcfgr_size(coresight_pmu) > 31);
> +}
> +
> +/* Get number of counters, minus one. */
> +static inline u32 pmcfgr_n(const struct coresight_pmu *coresight_pmu)
> +{
> +	return (coresight_pmu->pmcfgr >> PMCFGR_N_SHIFT) & PMCFGR_N_MASK;
> +}
> +
> +ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
> +				       struct device_attribute *attr, char *buf)
> +{
> +	struct dev_ext_attribute *eattr =
> +		container_of(attr, struct dev_ext_attribute, attr);
> +	return sysfs_emit(buf, "event=0x%llx\n",
> +			  (unsigned long long)eattr->var);
> +}
> +EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_event_show);
> +
> +/**
> + * Event list of PMU that does not support cycle counter. Currently the
> + * CoreSight PMU spec does not define standard events, so it is empty now.
> + */
> +static struct attribute *coresight_pmu_event_attrs[] = {
> +	NULL,
> +};
> +
> +/* Event list of PMU supporting cycle counter. */
> +static struct attribute *coresight_pmu_event_attrs_cc[] = {
> +	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
> +	NULL,
> +};
> +
> +struct attribute **
> +coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu)
> +{
> +	return (support_cc(coresight_pmu)) ? coresight_pmu_event_attrs_cc :
> +					     coresight_pmu_event_attrs;
> +}
> +EXPORT_SYMBOL_GPL(coresight_pmu_get_event_attrs);

If cycle count is a standard but optional event, just include it in the 
stndard event attrs and use .is_visible to filter it our when not 
present. No need for this overcomplicated machinery.

> +
> +ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
> +					struct device_attribute *attr,
> +					char *buf)
> +{
> +	struct dev_ext_attribute *eattr =
> +		container_of(attr, struct dev_ext_attribute, attr);
> +	return sysfs_emit(buf, "%s\n", (char *)eattr->var);
> +}
> +EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_format_show);
> +
> +static struct attribute *coresight_pmu_format_attrs[] = {
> +	CORESIGHT_FORMAT_ATTR(event, "config:0-31"),
> +	CORESIGHT_FORMAT_ATTR(filter, "config:32-63"),
> +	NULL,
> +};
> +
> +struct attribute **
> +coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu)
> +{
> +	return coresight_pmu_format_attrs;
> +}
> +EXPORT_SYMBOL_GPL(coresight_pmu_get_format_attrs);
> +
> +u32 coresight_pmu_event_type(const struct perf_event *event)
> +{
> +	return event->attr.config & CORESIGHT_EVENT_MASK;
> +}
> +EXPORT_SYMBOL_GPL(coresight_pmu_event_type);
> +
> +u32 coresight_pmu_event_filter(const struct perf_event *event)
> +{
> +	return (event->attr.config >> CORESIGHT_FILTER_SHIFT) &
> +	       CORESIGHT_FILTER_MASK;
> +}
> +EXPORT_SYMBOL_GPL(coresight_pmu_event_filter);
> +
> +static ssize_t coresight_pmu_identifier_show(struct device *dev,
> +					     struct device_attribute *attr,
> +					     char *page)
> +{
> +	struct coresight_pmu *coresight_pmu =
> +		to_coresight_pmu(dev_get_drvdata(dev));
> +
> +	return sysfs_emit(page, "%s\n", coresight_pmu->identifier);
> +}
> +
> +static struct device_attribute coresight_pmu_identifier_attr =
> +	__ATTR(identifier, 0444, coresight_pmu_identifier_show, NULL);
> +
> +static struct attribute *coresight_pmu_identifier_attrs[] = {
> +	&coresight_pmu_identifier_attr.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group coresight_pmu_identifier_attr_group = {
> +	.attrs = coresight_pmu_identifier_attrs,
> +};
> +
> +const char *
> +coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu)
> +{
> +	const char *identifier =
> +		devm_kasprintf(coresight_pmu->dev, GFP_KERNEL, "%x",
> +			       coresight_pmu->impl.pmiidr);
> +	return identifier;
> +}
> +EXPORT_SYMBOL_GPL(coresight_pmu_get_identifier);
> +
> +static ssize_t coresight_pmu_cpumask_show(struct device *dev,
> +					  struct device_attribute *attr,
> +					  char *buf)
> +{
> +	struct pmu *pmu = dev_get_drvdata(dev);
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
> +	struct dev_ext_attribute *eattr =
> +		container_of(attr, struct dev_ext_attribute, attr);
> +	unsigned long mask_id = (unsigned long)eattr->var;
> +	const cpumask_t *cpumask;
> +
> +	switch (mask_id) {
> +	case CORESIGHT_ACTIVE_CPU_MASK:
> +		cpumask = &coresight_pmu->active_cpu;
> +		break;
> +	case CORESIGHT_ASSOCIATED_CPU_MASK:
> +		cpumask = &coresight_pmu->associated_cpus;
> +		break;
> +	default:
> +		return 0;
> +	}
> +	return cpumap_print_to_pagebuf(true, buf, cpumask);
> +}
> +
> +static struct attribute *coresight_pmu_cpumask_attrs[] = {
> +	CORESIGHT_CPUMASK_ATTR(cpumask, CORESIGHT_ACTIVE_CPU_MASK),
> +	CORESIGHT_CPUMASK_ATTR(associated_cpus, CORESIGHT_ASSOCIATED_CPU_MASK),
> +	NULL,
> +};
> +
> +static struct attribute_group coresight_pmu_cpumask_attr_group = {
> +	.attrs = coresight_pmu_cpumask_attrs,
> +};
> +
> +static const struct coresight_pmu_impl_ops default_impl_ops = {
> +	.get_event_attrs	= coresight_pmu_get_event_attrs,
> +	.get_format_attrs	= coresight_pmu_get_format_attrs,
> +	.get_identifier		= coresight_pmu_get_identifier,
> +	.is_cc_event		= coresight_pmu_is_cc_event,
> +	.event_type		= coresight_pmu_event_type,
> +	.event_filter		= coresight_pmu_event_filter
> +};
> +
> +struct impl_match {
> +	u32 jedec_jep106_id;
> +	int (*impl_init_ops)(struct coresight_pmu *coresight_pmu);
> +};
> +
> +static const struct impl_match impl_match[] = {
> +	{}
> +};
> +
> +static int coresight_pmu_init_impl_ops(struct coresight_pmu *coresight_pmu)
> +{
> +	int idx, ret;
> +	u32 jedec_id;
> +	struct acpi_apmt_node *apmt_node = coresight_pmu->apmt_node;
> +	const struct impl_match *match = impl_match;
> +
> +	/*
> +	 * Get PMU implementer and product id from APMT node.
> +	 * If APMT node doesn't have implementer/product id, try get it
> +	 * from PMIIDR.
> +	 */
> +	coresight_pmu->impl.pmiidr =
> +		(apmt_node->impl_id) ? apmt_node->impl_id :
> +				       read_reg32(coresight_pmu->base0, PMIIDR);

The spec says the opposite, that the APMT field should be ignored if 
PMIIDR or PMPIDR is present.

> +	jedec_id = coresight_pmu->impl.pmiidr & PMIIDR_IMPLEMENTER_MASK;
> +
> +	/* Find implementer specific attribute ops. */
> +	for (idx = 0; match->jedec_jep106_id; match++, idx++) {
> +		if (match->jedec_jep106_id == jedec_id) {

I reckon we could simply have (value,mask) pairs in impl_match to 
directly match the whole IIDR value to some implementation ops, and save 
some bother here. It could always be refactored if and when a sufficient 
number of different implementations exist to make that approach unwieldy.

> +			ret = match->impl_init_ops(coresight_pmu);
> +			if (ret)
> +				return ret;
> +
> +			return 0;
> +		}
> +	}
> +
> +	/* We don't find implementer specific attribute ops, use default. */
> +	coresight_pmu->impl.ops = &default_impl_ops;
> +	return 0;
> +}
> +
> +static struct attribute_group *
> +coresight_pmu_alloc_event_attr_group(struct coresight_pmu *coresight_pmu)
> +{
> +	struct attribute_group *event_group;
> +	struct device *dev = coresight_pmu->dev;
> +
> +	event_group =
> +		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> +	if (!event_group)
> +		return NULL;
> +
> +	event_group->name = "events";
> +	event_group->attrs =
> +		coresight_pmu->impl.ops->get_event_attrs(coresight_pmu);
> +
> +	return event_group;
> +}
> +
> +static struct attribute_group *
> +coresight_pmu_alloc_format_attr_group(struct coresight_pmu *coresight_pmu)
> +{
> +	struct attribute_group *format_group;
> +	struct device *dev = coresight_pmu->dev;
> +
> +	format_group =
> +		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> +	if (!format_group)
> +		return NULL;
> +
> +	format_group->name = "format";
> +	format_group->attrs =
> +		coresight_pmu->impl.ops->get_format_attrs(coresight_pmu);
> +
> +	return format_group;
> +}
> +
> +static struct attribute_group **
> +coresight_pmu_alloc_attr_group(struct coresight_pmu *coresight_pmu)
> +{
> +	const struct coresight_pmu_impl_ops *impl_ops;
> +	struct attribute_group **attr_groups = NULL;
> +	struct device *dev = coresight_pmu->dev;
> +	int ret;
> +
> +	ret = coresight_pmu_init_impl_ops(coresight_pmu);
> +	if (ret)
> +		return NULL;
> +
> +	impl_ops = coresight_pmu->impl.ops;
> +
> +	coresight_pmu->identifier = impl_ops->get_identifier(coresight_pmu);
> +
> +	attr_groups = devm_kzalloc(dev, 5 * sizeof(struct attribute_group *),
> +				   GFP_KERNEL);
> +	if (!attr_groups)
> +		return NULL;
> +
> +	attr_groups[0] = coresight_pmu_alloc_event_attr_group(coresight_pmu);
> +	attr_groups[1] = coresight_pmu_alloc_format_attr_group(coresight_pmu);
> +	attr_groups[2] = &coresight_pmu_identifier_attr_group;
> +	attr_groups[3] = &coresight_pmu_cpumask_attr_group;
> +
> +	return attr_groups;
> +}
> +
> +static inline void
> +coresight_pmu_start_counters(struct coresight_pmu *coresight_pmu)
> +{
> +	u32 pmcr;
> +
> +	pmcr = read_reg32(coresight_pmu->base0, PMCR);
> +	pmcr |= PMCR_E;
> +	write_reg32(pmcr, coresight_pmu->base0, PMCR);
> +}
> +
> +static inline void
> +coresight_pmu_stop_counters(struct coresight_pmu *coresight_pmu)
> +{
> +	u32 pmcr;
> +
> +	pmcr = read_reg32(coresight_pmu->base0, PMCR);
> +	pmcr &= ~PMCR_E;
> +	write_reg32(pmcr, coresight_pmu->base0, PMCR); > +}

I'm inclined to think these shouldn't be read-modify-write 
implementations. Arguably the driver should reset the control register 
to a known state initially, so from then on it can simply write new 
values based oin what it knows it's changing.

AFAICS from the spec only the PMCR.E bit has a defined reset value, so 
preserving random values in other bits like FZO and D is sure to be fun.

> +
> +static void coresight_pmu_enable(struct pmu *pmu)
> +{
> +	int enabled;
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
> +
> +	enabled = bitmap_weight(coresight_pmu->hw_events.used_ctrs,
> +				CORESIGHT_PMU_MAX_HW_CNTRS);
> +
> +	if (!enabled)
> +		return;

Use bitmap_empty() for checking if a bitmap is empty.

> +
> +	coresight_pmu_start_counters(coresight_pmu);
> +}
> +
> +static void coresight_pmu_disable(struct pmu *pmu)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
> +
> +	coresight_pmu_stop_counters(coresight_pmu);
> +}
> +
> +static inline bool is_cycle_cntr_idx(const struct perf_event *event)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +	int idx = event->hw.idx;
> +
> +	return (support_cc(coresight_pmu) && idx == CORESIGHT_PMU_IDX_CCNTR);

If we don't support cycle counting, cycles count events should have been 
rejected in event_init. If they're able to propagate further than that

> +}
> +
> +bool coresight_pmu_is_cc_event(const struct perf_event *event)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +	u32 evtype = coresight_pmu->impl.ops->event_type(event);
> +
> +	return (support_cc(coresight_pmu) &&

Ditto.

> +		evtype == CORESIGHT_PMU_EVT_CYCLES_DEFAULT);
> +}
> +EXPORT_SYMBOL_GPL(coresight_pmu_is_cc_event);
> +
> +static int
> +coresight_pmu_get_event_idx(struct coresight_pmu_hw_events *hw_events,
> +			    struct perf_event *event)
> +{
> +	int idx, reserve_cc;
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +
> +	if (coresight_pmu->impl.ops->is_cc_event(event)) {
> +		/* Search for available cycle counter. */
> +		if (test_and_set_bit(CORESIGHT_PMU_IDX_CCNTR,
> +				     hw_events->used_ctrs))
> +			return -EAGAIN;
> +
> +		return CORESIGHT_PMU_IDX_CCNTR;
> +	}
> +
> +	/*
> +	 * CoreSight PMU can support up to 256 counters. The cycle counter is
> +	 * always on counter[31]. To prevent regular event from using cycle
> +	 * counter, we reserve the cycle counter bit temporarily.
> +	 */
> +	reserve_cc = 0;
> +	if (support_cc(coresight_pmu) &&
> +	    coresight_pmu->num_adj_counters >= CORESIGHT_PMU_IDX_CCNTR)
> +		reserve_cc = (test_and_set_bit(CORESIGHT_PMU_IDX_CCNTR,
> +					       hw_events->used_ctrs) == 0);

It would seem a lot easier to reserve PMEVCNTR[31] permanently and track 
allocation of PMCCNTR with a separate flag, when appropriate.

> +
> +	/* Search available regular counter from the used counter bitmap. */
> +	idx = find_first_zero_bit(hw_events->used_ctrs,
> +				  coresight_pmu->num_adj_counters);
> +
> +	/* Restore cycle counter bit. */
> +	if (reserve_cc)
> +		clear_bit(CORESIGHT_PMU_IDX_CCNTR, hw_events->used_ctrs);
> +
> +	if (idx >= coresight_pmu->num_adj_counters)
> +		return -EAGAIN;
> +
> +	set_bit(idx, hw_events->used_ctrs);
> +
> +	return idx;
> +}
> +
> +static bool
> +coresight_pmu_validate_event(struct pmu *pmu,
> +			     struct coresight_pmu_hw_events *hw_events,
> +			     struct perf_event *event)
> +{
> +	if (is_software_event(event))
> +		return true;
> +
> +	/* Reject groups spanning multiple HW PMUs. */
> +	if (event->pmu != pmu)
> +		return false;
> +
> +	return (coresight_pmu_get_event_idx(hw_events, event) >= 0);
> +}
> +
> +/**
> + * Make sure the group of events can be scheduled at once
> + * on the PMU.
> + */
> +static bool coresight_pmu_validate_group(struct perf_event *event)
> +{
> +	struct perf_event *sibling, *leader = event->group_leader;
> +	struct coresight_pmu_hw_events fake_hw_events;

Do you not get a compile-time warning about this?

> +	if (event->group_leader == event)
> +		return true;
> +
> +	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
> +
> +	if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events, leader))
> +		return false;
> +
> +	for_each_sibling_event(sibling, leader) {
> +		if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events,
> +						  sibling))
> +			return false;
> +	}
> +
> +	return coresight_pmu_validate_event(event->pmu, &fake_hw_events, event);
> +}
> +
> +static int coresight_pmu_event_init(struct perf_event *event)
> +{
> +	struct coresight_pmu *coresight_pmu;
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	coresight_pmu = to_coresight_pmu(event->pmu);
> +
> +	/**

This isn't kerneldoc.

> +	 * Following other "uncore" PMUs, we do not support sampling mode or
> +	 * attach to a task (per-process mode).
> +	 */
> +	if (is_sampling_event(event)) {
> +		dev_dbg(coresight_pmu->pmu.dev,
> +			"Can't support sampling events\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
> +		dev_dbg(coresight_pmu->pmu.dev,
> +			"Can't support per-task counters\n");
> +		return -EINVAL;
> +	}
> +
> +	/**

Ditto.

> +	 * Make sure the CPU assignment is on one of the CPUs associated with
> +	 * this PMU.
> +	 */
> +	if (!cpumask_test_cpu(event->cpu, &coresight_pmu->associated_cpus)) {
> +		dev_dbg(coresight_pmu->pmu.dev,
> +			"Requested cpu is not associated with the PMU\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Enforce the current active CPU to handle the events in this PMU. */
> +	event->cpu = cpumask_first(&coresight_pmu->active_cpu);
> +	if (event->cpu >= nr_cpu_ids)
> +		return -EINVAL;
> +
> +	if (!coresight_pmu_validate_group(event))
> +		return -EINVAL;
> +
> +	/**

Ditto.

> +	 * We don't assign an index until we actually place the event onto
> +	 * hardware. Use -1 to signify that we haven't decided where to put it
> +	 * yet.
> +	 */
> +	hwc->idx = -1;
> +	hwc->config_base = coresight_pmu->impl.ops->event_type(event);
> +
> +	return 0;
> +}
> +
> +static inline u32 counter_offset(u32 reg_sz, u32 ctr_idx)
> +{
> +	return (PMEVCNTR_LO + (reg_sz * ctr_idx));
> +}
> +
> +static void coresight_pmu_write_counter(struct perf_event *event, u64 val)
> +{
> +	u32 offset;
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +
> +	if (use_64b_counter_reg(coresight_pmu)) {
> +		offset = counter_offset(sizeof(u64), event->hw.idx);
> +
> +		coresight_pmu->write_reg64(val, coresight_pmu->base1, offset);
> +	} else {
> +		offset = counter_offset(sizeof(u32), event->hw.idx);
> +
> +		write_reg32(lower_32_bits(val), coresight_pmu->base1, offset);
> +	}
> +}
> +
> +static u64 coresight_pmu_read_counter(struct perf_event *event)
> +{
> +	u32 offset;
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +
> +	if (use_64b_counter_reg(coresight_pmu)) {
> +		offset = counter_offset(sizeof(u64), event->hw.idx);
> +		return coresight_pmu->read_reg64(coresight_pmu->base1, offset);
> +	}
> +
> +	offset = counter_offset(sizeof(u32), event->hw.idx);
> +	return read_reg32(coresight_pmu->base1, offset);
> +}
> +
> +/**
> + * coresight_pmu_set_event_period: Set the period for the counter.
> + *
> + * To handle cases of extreme interrupt latency, we program
> + * the counter with half of the max count for the counters.
> + */
> +static void coresight_pmu_set_event_period(struct perf_event *event)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +	u64 val = GENMASK_ULL(pmcfgr_size(coresight_pmu), 0) >> 1;
> +
> +	local64_set(&event->hw.prev_count, val);
> +	coresight_pmu_write_counter(event, val);
> +}
> +
> +static void coresight_pmu_enable_counter(struct coresight_pmu *coresight_pmu,
> +					 int idx)
> +{
> +	u32 reg_id, reg_bit, inten_off, cnten_off;
> +
> +	reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
> +	reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
> +
> +	inten_off = PMINTENSET + (4 * reg_id);
> +	cnten_off = PMCNTENSET + (4 * reg_id);
> +
> +	write_reg32(BIT(reg_bit), coresight_pmu->base0, inten_off);
> +	write_reg32(BIT(reg_bit), coresight_pmu->base0, cnten_off);
> +}
> +
> +static void coresight_pmu_disable_counter(struct coresight_pmu *coresight_pmu,
> +					  int idx)
> +{
> +	u32 reg_id, reg_bit, inten_off, cnten_off;
> +
> +	reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
> +	reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
> +
> +	inten_off = PMINTENCLR + (4 * reg_id);
> +	cnten_off = PMCNTENCLR + (4 * reg_id);
> +
> +	write_reg32(BIT(reg_bit), coresight_pmu->base0, cnten_off);
> +	write_reg32(BIT(reg_bit), coresight_pmu->base0, inten_off);
> +}
> +
> +static void coresight_pmu_event_update(struct perf_event *event)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +	u64 delta, prev, now;
> +
> +	do {
> +		prev = local64_read(&hwc->prev_count);
> +		now = coresight_pmu_read_counter(event);
> +	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
> +
> +	delta = (now - prev) & GENMASK_ULL(pmcfgr_size(coresight_pmu), 0);
> +	local64_add(delta, &event->count);
> +}
> +
> +static inline void coresight_pmu_set_event(struct coresight_pmu *coresight_pmu,
> +					   struct hw_perf_event *hwc)
> +{
> +	u32 offset = PMEVTYPER + (4 * hwc->idx);
> +
> +	write_reg32(hwc->config_base, coresight_pmu->base0, offset);
> +}
> +
> +static inline void
> +coresight_pmu_set_ev_filter(struct coresight_pmu *coresight_pmu,
> +			    struct hw_perf_event *hwc, u32 filter)
> +{
> +	u32 offset = PMEVFILTR + (4 * hwc->idx);
> +
> +	write_reg32(filter, coresight_pmu->base0, offset);
> +}
> +
> +static inline void
> +coresight_pmu_set_cc_filter(struct coresight_pmu *coresight_pmu, u32 filter)
> +{
> +	u32 offset = PMCCFILTR;
> +
> +	write_reg32(filter, coresight_pmu->base0, offset);
> +}
> +
> +static void coresight_pmu_start(struct perf_event *event, int pmu_flags)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +	u32 filter;
> +
> +	/* We always reprogram the counter */
> +	if (pmu_flags & PERF_EF_RELOAD)
> +		WARN_ON(!(hwc->state & PERF_HES_UPTODATE));
> +
> +	coresight_pmu_set_event_period(event);
> +
> +	filter = coresight_pmu->impl.ops->event_filter(event);
> +
> +	if (is_cycle_cntr_idx(event)) {
> +		coresight_pmu_set_cc_filter(coresight_pmu, filter);
> +	} else {
> +		coresight_pmu_set_event(coresight_pmu, hwc);
> +		coresight_pmu_set_ev_filter(coresight_pmu, hwc, filter);
> +	}
> +
> +	hwc->state = 0;
> +
> +	coresight_pmu_enable_counter(coresight_pmu, hwc->idx);
> +}
> +
> +static void coresight_pmu_stop(struct perf_event *event, int pmu_flags)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (hwc->state & PERF_HES_STOPPED)
> +		return;
> +
> +	coresight_pmu_disable_counter(coresight_pmu, hwc->idx);
> +	coresight_pmu_event_update(event);
> +
> +	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
> +}
> +
> +static int coresight_pmu_add(struct perf_event *event, int flags)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +	struct coresight_pmu_hw_events *hw_events = &coresight_pmu->hw_events;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx;
> +
> +	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> +					   &coresight_pmu->associated_cpus)))
> +		return -ENOENT;
> +
> +	idx = coresight_pmu_get_event_idx(hw_events, event);
> +	if (idx < 0)
> +		return idx;
> +
> +	hw_events->events[idx] = event;
> +	hwc->idx = idx;
> +	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> +
> +	if (flags & PERF_EF_START)
> +		coresight_pmu_start(event, PERF_EF_RELOAD);
> +
> +	/* Propagate changes to the userspace mapping. */
> +	perf_event_update_userpage(event);
> +
> +	return 0;
> +}
> +
> +static void coresight_pmu_del(struct perf_event *event, int flags)
> +{
> +	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
> +	struct coresight_pmu_hw_events *hw_events = &coresight_pmu->hw_events;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->idx;
> +
> +	coresight_pmu_stop(event, PERF_EF_UPDATE);
> +
> +	hw_events->events[idx] = NULL;
> +
> +	clear_bit(idx, hw_events->used_ctrs);
> +
> +	perf_event_update_userpage(event);
> +}
> +
> +static void coresight_pmu_read(struct perf_event *event)
> +{
> +	coresight_pmu_event_update(event);
> +}
> +
> +static int coresight_pmu_alloc(struct platform_device *pdev,
> +			       struct coresight_pmu **coresight_pmu)
> +{
> +	struct acpi_apmt_node *apmt_node;
> +	struct device *dev;
> +	struct coresight_pmu *pmu;
> +
> +	dev = &pdev->dev;
> +	apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
> +	if (!apmt_node) {
> +		dev_err(dev, "failed to get APMT node\n");
> +		return -ENOMEM;
> +	}
> +
> +	pmu = devm_kzalloc(dev, sizeof(*pmu), GFP_KERNEL);
> +	if (!pmu)
> +		return -ENOMEM;
> +
> +	*coresight_pmu = pmu;
> +
> +	pmu->dev = dev;
> +	pmu->apmt_node = apmt_node;
> +	pmu->name =
> +		devm_kasprintf(dev, GFP_KERNEL, PMUNAME "%u", apmt_node->id);
> +
> +	platform_set_drvdata(pdev, coresight_pmu);
> +
> +	return 0;
> +}
> +
> +static int coresight_pmu_init_mmio(struct coresight_pmu *coresight_pmu)
> +{
> +	struct device *dev;
> +	struct platform_device *pdev;
> +	struct resource *res;
> +	struct acpi_apmt_node *apmt_node;
> +
> +	dev = coresight_pmu->dev;
> +	pdev = to_platform_device(dev);
> +	apmt_node = coresight_pmu->apmt_node;
> +
> +	/* Base address for page 0. */
> +	res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> +	if (!res) {
> +		dev_err(dev, "failed to get page-0 resource\n");
> +		return -ENOMEM;
> +	}
> +
> +	coresight_pmu->base0 = devm_ioremap_resource(dev, res);
> +	if (IS_ERR(coresight_pmu->base0)) {
> +		dev_err(dev, "ioremap failed for page-0 resource\n");
> +		return PTR_ERR(coresight_pmu->base0);
> +	}

devm_platform_ioremap_resource()

> +	/* Base address for page 1 if supported. Otherwise point it to page 0. */
> +	coresight_pmu->base1 = coresight_pmu->base0;
> +	if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
> +		res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> +		if (!res) {
> +			dev_err(dev, "failed to get page-1 resource\n");
> +			return -ENOMEM;
> +		}
> +
> +		coresight_pmu->base1 = devm_ioremap_resource(dev, res);
> +		if (IS_ERR(coresight_pmu->base1)) {
> +			dev_err(dev, "ioremap failed for page-1 resource\n");
> +			return PTR_ERR(coresight_pmu->base1);
> +		}

Ditto.

> +	}
> +
> +	if (CHECK_APMT_FLAG(apmt_node->flags, ATOMIC, SUPP)) {
> +		coresight_pmu->read_reg64 = &read_reg64;
> +		coresight_pmu->write_reg64 = &write_reg64;
> +	} else {
> +		coresight_pmu->read_reg64 = &read_reg64_hilohi;
> +		coresight_pmu->write_reg64 = &write_reg64_lohi;
> +	}
> +
> +	coresight_pmu->pmcfgr = read_reg32(coresight_pmu->base0, PMCFGR);
> +
> +	coresight_pmu->num_adj_counters = pmcfgr_n(coresight_pmu) + 1;
> +
> +	if (support_cc(coresight_pmu)) {
> +		/**
> +		 * Exclude the cycle counter if there is a gap between
> +		 * cycle counter id and the last regular event counter id.
> +		 */
> +		if (coresight_pmu->num_adj_counters <= CORESIGHT_PMU_IDX_CCNTR)
> +			coresight_pmu->num_adj_counters -= 1;

As before, I think it would be a fair bit clearer to maintain a 
distinction between the number of PMEV{TYPE,CNT,FILT}R registers present 
and the number of logical counters actually usable.

> +	}
> +
> +	coresight_pmu->num_set_clr_reg =
> +		round_up(coresight_pmu->num_adj_counters,
> +			 CORESIGHT_SET_CLR_REG_COUNTER_NUM) /
> +		CORESIGHT_SET_CLR_REG_COUNTER_NUM;

DIV_ROUND_UP()

> +
> +	return 0;
> +}
> +
> +static inline int
> +coresight_pmu_get_reset_overflow(struct coresight_pmu *coresight_pmu,
> +				 u32 *pmovs)
> +{
> +	int i;
> +	u32 pmovclr_offset = PMOVSCLR;
> +	u32 has_overflowed = 0;
> +
> +	for (i = 0; i < coresight_pmu->num_set_clr_reg; ++i) {
> +		pmovs[i] = read_reg32(coresight_pmu->base1, pmovclr_offset);
> +		has_overflowed |= pmovs[i];
> +		write_reg32(pmovs[i], coresight_pmu->base1, pmovclr_offset);
> +		pmovclr_offset += sizeof(u32);
> +	}
> +
> +	return has_overflowed != 0;
> +}
> +
> +static irqreturn_t coresight_pmu_handle_irq(int irq_num, void *dev)
> +{
> +	int idx, has_overflowed;
> +	struct coresight_pmu *coresight_pmu = dev;
> +	u32 pmovs[CORESIGHT_SET_CLR_REG_MAX_NUM] = { 0 };
> +	bool handled = false;
> +
> +	coresight_pmu_stop_counters(coresight_pmu);
> +
> +	has_overflowed = coresight_pmu_get_reset_overflow(coresight_pmu, pmovs);
> +	if (!has_overflowed)
> +		goto done;
> +
> +	for_each_set_bit(idx, (unsigned long *)pmovs,
> +			 CORESIGHT_PMU_MAX_HW_CNTRS) {

Why waste time iterating over a probably significant number of 
irrelevant bits?

> +		struct perf_event *event = coresight_pmu->hw_events.events[idx];
> +
> +		if (!event)
> +			continue;
> +
> +		coresight_pmu_event_update(event);
> +		coresight_pmu_set_event_period(event);
> +
> +		handled = true;
> +	}
> +
> +done:
> +	coresight_pmu_start_counters(coresight_pmu);
> +	return IRQ_RETVAL(handled);
> +}
> +
> +static int coresight_pmu_request_irq(struct coresight_pmu *coresight_pmu)
> +{
> +	int irq, ret;
> +	struct device *dev;
> +	struct platform_device *pdev;
> +	struct acpi_apmt_node *apmt_node;
> +
> +	dev = coresight_pmu->dev;
> +	pdev = to_platform_device(dev);
> +	apmt_node = coresight_pmu->apmt_node;
> +
> +	/* Skip IRQ request if the PMU does not support overflow interrupt. */
> +	if (apmt_node->ovflw_irq == 0)
> +		return 0;
> +
> +	irq = platform_get_irq(pdev, 0);
> +	if (irq < 0)
> +		return irq;
> +
> +	ret = devm_request_irq(dev, irq, coresight_pmu_handle_irq,
> +			       IRQF_NOBALANCING | IRQF_NO_THREAD, dev_name(dev),
> +			       coresight_pmu);
> +	if (ret) {
> +		dev_err(dev, "Could not request IRQ %d\n", irq);
> +		return ret;
> +	}
> +
> +	coresight_pmu->irq = irq;
> +
> +	return 0;
> +}
> +
> +static inline int coresight_pmu_find_cpu_container(int cpu, u32 container_uid)
> +{
> +	u32 acpi_uid;
> +	struct device *cpu_dev = get_cpu_device(cpu);
> +	struct acpi_device *acpi_dev = ACPI_COMPANION(cpu_dev);
> +	int level = 0;
> +
> +	if (!cpu_dev)
> +		return -ENODEV;
> +
> +	while (acpi_dev) {
> +		if (!strcmp(acpi_device_hid(acpi_dev),
> +			    ACPI_PROCESSOR_CONTAINER_HID) &&
> +		    !kstrtouint(acpi_device_uid(acpi_dev), 0, &acpi_uid) &&
> +		    acpi_uid == container_uid)
> +			return 0;
> +
> +		acpi_dev = acpi_dev->parent;
> +		level++;
> +	}
> +
> +	return -ENODEV;
> +}
> +
> +static int coresight_pmu_get_cpus(struct coresight_pmu *coresight_pmu)
> +{
> +	struct acpi_apmt_node *apmt_node;
> +	int affinity_flag;
> +	int cpu;
> +
> +	apmt_node = coresight_pmu->apmt_node;
> +	affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
> +
> +	if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
> +		for_each_possible_cpu(cpu) {
> +			if (apmt_node->proc_affinity ==
> +			    get_acpi_id_for_cpu(cpu)) {
> +				cpumask_set_cpu(
> +					cpu, &coresight_pmu->associated_cpus);
> +				break;
> +			}
> +		}
> +	} else {
> +		for_each_possible_cpu(cpu) {
> +			if (coresight_pmu_find_cpu_container(
> +				    cpu, apmt_node->proc_affinity))
> +				continue;
> +
> +			cpumask_set_cpu(cpu, &coresight_pmu->associated_cpus);
> +		}
> +	}
> +
> +	return 0;
> +}
> +
> +static int coresight_pmu_register_pmu(struct coresight_pmu *coresight_pmu)
> +{
> +	int ret;
> +	struct attribute_group **attr_groups;
> +
> +	attr_groups = coresight_pmu_alloc_attr_group(coresight_pmu);
> +	if (!attr_groups) {
> +		ret = -ENOMEM;
> +		return ret;
> +	}
> +
> +	ret = cpuhp_state_add_instance(coresight_pmu_cpuhp_state,
> +				       &coresight_pmu->cpuhp_node);
> +	if (ret)
> +		return ret;
> +
> +	coresight_pmu->pmu = (struct pmu){
> +		.task_ctx_nr	= perf_invalid_context,
> +		.module		= THIS_MODULE,
> +		.pmu_enable	= coresight_pmu_enable,
> +		.pmu_disable	= coresight_pmu_disable,
> +		.event_init	= coresight_pmu_event_init,
> +		.add		= coresight_pmu_add,
> +		.del		= coresight_pmu_del,
> +		.start		= coresight_pmu_start,
> +		.stop		= coresight_pmu_stop,
> +		.read		= coresight_pmu_read,
> +		.attr_groups	= (const struct attribute_group **)attr_groups,
> +		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
> +	};
> +
> +	ret = perf_pmu_register(&coresight_pmu->pmu, coresight_pmu->name, -1);
> +	if (ret) {
> +		cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
> +					    &coresight_pmu->cpuhp_node);
> +	}
> +
> +	return ret;
> +}
> +
> +static int coresight_pmu_device_probe(struct platform_device *pdev)
> +{
> +	int ret;
> +	struct coresight_pmu *coresight_pmu;
> +
> +	ret = coresight_pmu_alloc(pdev, &coresight_pmu);
> +	if (ret)
> +		return ret;
> +
> +	ret = coresight_pmu_init_mmio(coresight_pmu);
> +	if (ret)
> +		return ret;
> +
> +	ret = coresight_pmu_request_irq(coresight_pmu);
> +	if (ret)
> +		return ret;
> +
> +	ret = coresight_pmu_get_cpus(coresight_pmu);
> +	if (ret)
> +		return ret;
> +
> +	ret = coresight_pmu_register_pmu(coresight_pmu);
> +	if (ret)
> +		return ret;
> +
> +	return 0;
> +}
> +
> +static int coresight_pmu_device_remove(struct platform_device *pdev)
> +{
> +	struct coresight_pmu *coresight_pmu = platform_get_drvdata(pdev);
> +
> +	perf_pmu_unregister(&coresight_pmu->pmu);
> +	cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
> +				    &coresight_pmu->cpuhp_node);
> +
> +	return 0;
> +}
> +
> +static struct platform_driver coresight_pmu_driver = {
> +	.driver = {
> +			.name = "arm-coresight-pmu",
> +			.suppress_bind_attrs = true,
> +		},
> +	.probe = coresight_pmu_device_probe,
> +	.remove = coresight_pmu_device_remove,
> +};
> +
> +static void coresight_pmu_set_active_cpu(int cpu,
> +					 struct coresight_pmu *coresight_pmu)
> +{
> +	cpumask_set_cpu(cpu, &coresight_pmu->active_cpu);
> +	WARN_ON(irq_set_affinity(coresight_pmu->irq,
> +				 &coresight_pmu->active_cpu));
> +}
> +
> +static int coresight_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
> +{
> +	struct coresight_pmu *coresight_pmu =
> +		hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
> +
> +	if (!cpumask_test_cpu(cpu, &coresight_pmu->associated_cpus))
> +		return 0;
> +
> +	/* If the PMU is already managed, there is nothing to do */
> +	if (!cpumask_empty(&coresight_pmu->active_cpu))
> +		return 0;
> +
> +	/* Use this CPU for event counting */
> +	coresight_pmu_set_active_cpu(cpu, coresight_pmu);
> +
> +	return 0;
> +}
> +
> +static int coresight_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
> +{
> +	int dst;
> +	struct cpumask online_supported;
> +
> +	struct coresight_pmu *coresight_pmu =
> +		hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
> +
> +	/* Nothing to do if this CPU doesn't own the PMU */
> +	if (!cpumask_test_and_clear_cpu(cpu, &coresight_pmu->active_cpu))
> +		return 0;
> +
> +	/* Choose a new CPU to migrate ownership of the PMU to */
> +	cpumask_and(&online_supported, &coresight_pmu->associated_cpus,
> +		    cpu_online_mask);
> +	dst = cpumask_any_but(&online_supported, cpu);
> +	if (dst >= nr_cpu_ids)
> +		return 0;
> +
> +	/* Use this CPU for event counting */
> +	perf_pmu_migrate_context(&coresight_pmu->pmu, cpu, dst);
> +	coresight_pmu_set_active_cpu(dst, coresight_pmu);
> +
> +	return 0;
> +}
> +
> +static int __init coresight_pmu_init(void)
> +{
> +	int ret;
> +
> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, PMUNAME,
> +				      coresight_pmu_cpu_online,
> +				      coresight_pmu_cpu_teardown);
> +	if (ret < 0)
> +		return ret;
> +	coresight_pmu_cpuhp_state = ret;
> +	return platform_driver_register(&coresight_pmu_driver);
> +}
> +
> +static void __exit coresight_pmu_exit(void)
> +{
> +	platform_driver_unregister(&coresight_pmu_driver);
> +	cpuhp_remove_multi_state(coresight_pmu_cpuhp_state);
> +}
> +
> +module_init(coresight_pmu_init);
> +module_exit(coresight_pmu_exit);
> diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.h b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
> new file mode 100644
> index 000000000000..59fb40eafe45
> --- /dev/null
> +++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
> @@ -0,0 +1,147 @@
> +/* SPDX-License-Identifier: GPL-2.0
> + *
> + * ARM CoreSight PMU driver.
> + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> + *
> + */
> +
> +#ifndef __ARM_CORESIGHT_PMU_H__
> +#define __ARM_CORESIGHT_PMU_H__
> +
> +#include <linux/acpi.h>
> +#include <linux/bitfield.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +#include <linux/types.h>
> +
> +#define to_coresight_pmu(p) (container_of(p, struct coresight_pmu, pmu))
> +
> +#define CORESIGHT_EXT_ATTR(_name, _func, _config)			\
> +	(&((struct dev_ext_attribute[]){				\
> +		{							\
> +			.attr = __ATTR(_name, 0444, _func, NULL),	\
> +			.var = (void *)_config				\
> +		}							\
> +	})[0].attr.attr)
> +
> +#define CORESIGHT_FORMAT_ATTR(_name, _config)				\
> +	CORESIGHT_EXT_ATTR(_name, coresight_pmu_sysfs_format_show,	\
> +			   (char *)_config)
> +
> +#define CORESIGHT_EVENT_ATTR(_name, _config)				\
> +	PMU_EVENT_ATTR_ID(_name, coresight_pmu_sysfs_event_show, _config)
> +
> +/**
> + * This is the default event number for cycle count, if supported, since the
> + * ARM Coresight PMU specification does not define a standard event code
> + * for cycle count.
> + */
> +#define CORESIGHT_PMU_EVT_CYCLES_DEFAULT (0x1ULL << 31)

And what do we do when an implementation defines 0x80000000 as one of 
its own event specifiers? The standard cycle count is independent of any 
other events, so it needs to be encoded in a manner which is distinct 
from *any* potentially-valid PMEVTYPER value.

> +
> +/**
> + * The ARM Coresight PMU supports up to 256 event counters.
> + * If the counters are larger-than 32-bits, then the PMU includes at
> + * most 128 counters.
> + */
> +#define CORESIGHT_PMU_MAX_HW_CNTRS 256
> +
> +/* The cycle counter, if implemented, is located at counter[31]. */
> +#define CORESIGHT_PMU_IDX_CCNTR 31
> +
> +struct coresight_pmu;
> +
> +/* This tracks the events assigned to each counter in the PMU. */
> +struct coresight_pmu_hw_events {
> +	/* The events that are active on the PMU for the given index. */
> +	struct perf_event *events[CORESIGHT_PMU_MAX_HW_CNTRS];

This is really quite big - 2KB per PMU on 64-bit - given the likelihood 
that typically only a fraction of that might be needed. As mentioned, it 
should already be tickling CONFIG_FRAME_WARN in 
coresight_pmu_validate_group().

Thanks,
Robin.

> +	/* Each bit indicates a counter is being used (or not) for an event. */
> +	DECLARE_BITMAP(used_ctrs, CORESIGHT_PMU_MAX_HW_CNTRS);
> +};
> +
> +/* Contains ops to query vendor/implementer specific attribute. */
> +struct coresight_pmu_impl_ops {
> +	/* Get event attributes */
> +	struct attribute **(*get_event_attrs)(
> +		const struct coresight_pmu *coresight_pmu);
> +	/* Get format attributes */
> +	struct attribute **(*get_format_attrs)(
> +		const struct coresight_pmu *coresight_pmu);
> +	/* Get string identifier */
> +	const char *(*get_identifier)(const struct coresight_pmu *coresight_pmu);
> +	/* Check if the event corresponds to cycle count event */
> +	bool (*is_cc_event)(const struct perf_event *event);
> +	/* Decode event type/id from configs */
> +	u32 (*event_type)(const struct perf_event *event);
> +	/* Decode filter value from configs */
> +	u32 (*event_filter)(const struct perf_event *event);
> +};
> +
> +/* Vendor/implementer descriptor. */
> +struct coresight_pmu_impl {
> +	u32 pmiidr;
> +	const struct coresight_pmu_impl_ops *ops;
> +};
> +
> +/* Coresight PMU descriptor. */
> +struct coresight_pmu {
> +	struct pmu pmu;
> +	struct device *dev;
> +	struct acpi_apmt_node *apmt_node;
> +	const char *name;
> +	const char *identifier;
> +	void __iomem *base0;
> +	void __iomem *base1;
> +	int irq;
> +	cpumask_t associated_cpus;
> +	cpumask_t active_cpu;
> +	struct hlist_node cpuhp_node;
> +
> +	u32 pmcfgr;
> +	u32 num_adj_counters;
> +	u32 num_set_clr_reg;
> +
> +	struct coresight_pmu_hw_events hw_events;
> +
> +	void (*write_reg64)(u64 val, void __iomem *base, u32 offset);
> +	u64 (*read_reg64)(void __iomem *base, u32 offset);
> +
> +	struct coresight_pmu_impl impl;
> +};
> +
> +/* Default function to show event attribute in sysfs. */
> +ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
> +				       struct device_attribute *attr,
> +				       char *buf);
> +
> +/* Default function to show format attribute in sysfs. */
> +ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
> +					struct device_attribute *attr,
> +					char *buf);
> +
> +/* Get the default Coresight PMU event attributes. */
> +struct attribute **
> +coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu);
> +
> +/* Get the default Coresight PMU format attributes. */
> +struct attribute **
> +coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu);
> +
> +/* Get the default Coresight PMU device identifier. */
> +const char *
> +coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu);
> +
> +/* Default function to query if an event is a cycle counter event. */
> +bool coresight_pmu_is_cc_event(const struct perf_event *event);
> +
> +/* Default function to query the type/id of an event. */
> +u32 coresight_pmu_event_type(const struct perf_event *event);
> +
> +/* Default function to query the filter value of an event. */
> +u32 coresight_pmu_event_filter(const struct perf_event *event);
> +
> +#endif /* __ARM_CORESIGHT_PMU_H__ */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-09 10:02   ` Suzuki K Poulose
@ 2022-05-09 12:20     ` Shaokun Zhang
  2022-05-09 22:07     ` Besar Wicaksono
  2022-05-10 11:07     ` Sudeep Holla
  2 siblings, 0 replies; 31+ messages in thread
From: Shaokun Zhang @ 2022-05-09 12:20 UTC (permalink / raw)
  To: Suzuki K Poulose, Will Deacon, Besar Wicaksono
  Cc: catalin.marinas, mark.rutland, linux-arm-kernel, linux-kernel,
	linux-tegra, sudeep.holla, thanu.rangarajan, Michael.Williams,
	treding, jonathanh, vsethi, Mathieu Poirier

Hi,

On 2022/5/9 18:02, Suzuki K Poulose wrote:
> Cc: Mike Williams, Mathieu Poirier
> 
> On 09/05/2022 10:28, Will Deacon wrote:
>> On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
>>> Add driver support for ARM CoreSight PMU device and event attributes for NVIDIA
>>> implementation. The code is based on ARM Coresight PMU architecture and ACPI ARM
>>> Performance Monitoring Unit table (APMT) specification below:
>>>   * ARM Coresight PMU:
>>>          https://developer.arm.com/documentation/ihi0091/latest
>>>   * APMT: https://developer.arm.com/documentation/den0117/latest
>>>
>>> Notes:
>>>   * There is a concern on the naming of the PMU device.
>>>     Currently the driver is probing "arm-coresight-pmu" device, however the APMT
>>>     spec supports different kinds of CoreSight PMU based implementation. So it is
>>>     open for discussion if the name can stay or a "generic" name is required.
>>>     Please see the following thread:
>>>     http://lists.infradead.org/pipermail/linux-arm-kernel/2022-May/740485.html
>>>
>>> Besar Wicaksono (2):
>>>    perf: coresight_pmu: Add support for ARM CoreSight PMU driver
>>>    perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
>>>
>>>   arch/arm64/configs/defconfig                  |    1 +
>>>   drivers/perf/Kconfig                          |    2 +
>>>   drivers/perf/Makefile                         |    1 +
>>>   drivers/perf/coresight_pmu/Kconfig            |   10 +
>>>   drivers/perf/coresight_pmu/Makefile           |    7 +
>>>   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317 +++++++++++++++++
>>>   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
>>>   .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
>>>   .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
>>>   9 files changed, 1802 insertions(+)
>>
>> How does this interact with all the stuff we have under
>> drivers/hwtracing/coresight/?
> 
> Absolutely zero, except for the name. The standard
> is named "CoreSight PMU" which is a bit unfortunate,
> given the only link, AFAIU, with the "CoreSight" architecture
> is the Lock Access Register(LAR). For reference, the
> drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
> tracing and the PMU is called "cs_etm" (expands to coresight etm).
> Otherwise the standard doesn't have anything to do with what
> exists already in the kernel.
> 
> That said, I am concerned that the "coresight_pmu" is easily confused
> with what exists today. Given that this is more of a "PMU" standard
> for the IPs in the Arm world, it would be better to name it as such
> avoiding any confusion with the existing PMUs.
> 
> One potential recommendation for the name is, "Arm PMU"  (The ACPI table is named Arm PMU Table).
> But then that could be clashing with the armv8_pmu :-(.
> 
> Some of the other options are :
> 
> "Arm Generic PMU"
> "Arm Uncore PMU"

To be honest, if wants to distinguish from Arm core PMU, the "Uncore PMU" is a
better choice just in my opinion. The x86 including Intel and AMD also do the
uncore PMU driver and with "uncore_" prefix.

Thanks,
Shaokun

> "Arm PMU"
> 
> Suzuki
> 
>>
>> Will
> 
> 
> _______________________________________________
> linux-arm-kernel mailing list
> linux-arm-kernel@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
> .

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-09 10:02   ` Suzuki K Poulose
  2022-05-09 12:20     ` Shaokun Zhang
@ 2022-05-09 22:07     ` Besar Wicaksono
  2022-05-10 11:07     ` Sudeep Holla
  2 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-09 22:07 UTC (permalink / raw)
  To: Suzuki K Poulose, Will Deacon
  Cc: catalin.marinas, mark.rutland, linux-arm-kernel, linux-kernel,
	linux-tegra, sudeep.holla, thanu.rangarajan, Michael.Williams,
	Thierry Reding, Jonathan Hunter, Vikram Sethi, Mathieu Poirier,
	Michael Williams (ATG)



> -----Original Message-----
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> Sent: Monday, May 9, 2022 5:02 AM
> To: Will Deacon <will@kernel.org>; Besar Wicaksono
> <bwicaksono@nvidia.com>
> Cc: catalin.marinas@arm.com; mark.rutland@arm.com; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>; Mathieu Poirier <mathieu.poirier@linaro.org>;
> Michael Williams (ATG) <Michael.Williams@arm.com>
> Subject: Re: [PATCH 0/2] perf: ARM CoreSight PMU support
> 
> External email: Use caution opening links or attachments
> 
> 
> Cc: Mike Williams, Mathieu Poirier
> 
> On 09/05/2022 10:28, Will Deacon wrote:
> > On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
> >> Add driver support for ARM CoreSight PMU device and event attributes
> for NVIDIA
> >> implementation. The code is based on ARM Coresight PMU architecture
> and ACPI ARM
> >> Performance Monitoring Unit table (APMT) specification below:
> >>   * ARM Coresight PMU:
> >>          https://developer.arm.com/documentation/ihi0091/latest
> >>   * APMT: https://developer.arm.com/documentation/den0117/latest
> >>
> >> Notes:
> >>   * There is a concern on the naming of the PMU device.
> >>     Currently the driver is probing "arm-coresight-pmu" device, however
> the APMT
> >>     spec supports different kinds of CoreSight PMU based implementation.
> So it is
> >>     open for discussion if the name can stay or a "generic" name is required.
> >>     Please see the following thread:
> >>     http://lists.infradead.org/pipermail/linux-arm-kernel/2022-
> May/740485.html
> >>
> >> Besar Wicaksono (2):
> >>    perf: coresight_pmu: Add support for ARM CoreSight PMU driver
> >>    perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
> >>
> >>   arch/arm64/configs/defconfig                  |    1 +
> >>   drivers/perf/Kconfig                          |    2 +
> >>   drivers/perf/Makefile                         |    1 +
> >>   drivers/perf/coresight_pmu/Kconfig            |   10 +
> >>   drivers/perf/coresight_pmu/Makefile           |    7 +
> >>   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317
> +++++++++++++++++
> >>   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
> >>   .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
> >>   .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
> >>   9 files changed, 1802 insertions(+)
> >
> > How does this interact with all the stuff we have under
> > drivers/hwtracing/coresight/?
> 
> Absolutely zero, except for the name. The standard
> is named "CoreSight PMU" which is a bit unfortunate,
> given the only link, AFAIU, with the "CoreSight" architecture
> is the Lock Access Register(LAR). For reference, the
> drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
> tracing and the PMU is called "cs_etm" (expands to coresight etm).
> Otherwise the standard doesn't have anything to do with what
> exists already in the kernel.

Yes, there is no correlation or interaction with existing driver for ETM/STM tracing.
It might share same authentication interface (NIDEN, DBGEN, SPIDEN, SPNIDEN)
for external debug. But these are not used in the driver.

> 
> That said, I am concerned that the "coresight_pmu" is easily confused
> with what exists today. Given that this is more of a "PMU" standard
> for the IPs in the Arm world, it would be better to name it as such
> avoiding any confusion with the existing PMUs.
> 
> One potential recommendation for the name is, "Arm PMU"  (The ACPI table
> is named Arm PMU Table). But then that could be clashing with the
> armv8_pmu :-(.
> 
> Some of the other options are :
> 
> "Arm Generic PMU"
> "Arm Uncore PMU"
> "Arm PMU"

As far as I understand, the APMT does not cover PMU in Arm PE.
I think "Arm Uncore PMU" would fit.

Regards,
Besar

> 
> Suzuki
> 
> >
> > Will


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-09 10:02   ` Suzuki K Poulose
  2022-05-09 12:20     ` Shaokun Zhang
  2022-05-09 22:07     ` Besar Wicaksono
@ 2022-05-10 11:07     ` Sudeep Holla
  2022-05-10 11:13       ` Will Deacon
  2 siblings, 1 reply; 31+ messages in thread
From: Sudeep Holla @ 2022-05-10 11:07 UTC (permalink / raw)
  To: Suzuki K Poulose, Besar Wicaksono
  Cc: Will Deacon, catalin.marinas, mark.rutland, linux-arm-kernel,
	linux-kernel, linux-tegra, thanu.rangarajan, Michael.Williams,
	treding, jonathanh, vsethi, Mathieu Poirier

On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
> Cc: Mike Williams, Mathieu Poirier
> 
> On 09/05/2022 10:28, Will Deacon wrote:
> > On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
> > > Add driver support for ARM CoreSight PMU device and event attributes for NVIDIA
> > > implementation. The code is based on ARM Coresight PMU architecture and ACPI ARM
> > > Performance Monitoring Unit table (APMT) specification below:
> > >   * ARM Coresight PMU:
> > >          https://developer.arm.com/documentation/ihi0091/latest
> > >   * APMT: https://developer.arm.com/documentation/den0117/latest
> > > 
> > > Notes:
> > >   * There is a concern on the naming of the PMU device.
> > >     Currently the driver is probing "arm-coresight-pmu" device, however the APMT
> > >     spec supports different kinds of CoreSight PMU based implementation. So it is
> > >     open for discussion if the name can stay or a "generic" name is required.
> > >     Please see the following thread:
> > >     http://lists.infradead.org/pipermail/linux-arm-kernel/2022-May/740485.html
> > > 
> > > Besar Wicaksono (2):
> > >    perf: coresight_pmu: Add support for ARM CoreSight PMU driver
> > >    perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
> > > 
> > >   arch/arm64/configs/defconfig                  |    1 +
> > >   drivers/perf/Kconfig                          |    2 +
> > >   drivers/perf/Makefile                         |    1 +
> > >   drivers/perf/coresight_pmu/Kconfig            |   10 +
> > >   drivers/perf/coresight_pmu/Makefile           |    7 +
> > >   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317 +++++++++++++++++
> > >   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
> > >   .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
> > >   .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
> > >   9 files changed, 1802 insertions(+)
> > 
> > How does this interact with all the stuff we have under
> > drivers/hwtracing/coresight/?
> 
> Absolutely zero, except for the name. The standard
> is named "CoreSight PMU" which is a bit unfortunate,
> given the only link, AFAIU, with the "CoreSight" architecture
> is the Lock Access Register(LAR). For reference, the
> drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
> tracing and the PMU is called "cs_etm" (expands to coresight etm).
> Otherwise the standard doesn't have anything to do with what
> exists already in the kernel.
> 
> That said, I am concerned that the "coresight_pmu" is easily confused
> with what exists today. Given that this is more of a "PMU" standard
> for the IPs in the Arm world, it would be better to name it as such
> avoiding any confusion with the existing PMUs.
> 

Thanks Suzuki. I did suggest something similar[1] but asked to retain it
so that it can be discussed in the bigger and right forum.

> One potential recommendation for the name is, "Arm PMU"  (The ACPI table is
> named Arm PMU Table). But then that could be clashing with the armv8_pmu
> :-(.
> 
> Some of the other options are :
> 
> "Arm Generic PMU"
> "Arm Uncore PMU"

I wasn't sure on this if there is any restriction on usage of this on Arm
and hence didn't make the suggestion. But if allowed, this would be my
choice too.

-- 
Regards,
Sudeep

[1] https://lore.kernel.org/lkml/20220504182633.a3mwuiohfqtjvpep@bogus

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-10 11:07     ` Sudeep Holla
@ 2022-05-10 11:13       ` Will Deacon
  2022-05-10 18:40         ` Sudeep Holla
  2022-05-11  8:44         ` Suzuki K Poulose
  0 siblings, 2 replies; 31+ messages in thread
From: Will Deacon @ 2022-05-10 11:13 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Suzuki K Poulose, Besar Wicaksono, catalin.marinas, mark.rutland,
	linux-arm-kernel, linux-kernel, linux-tegra, thanu.rangarajan,
	Michael.Williams, treding, jonathanh, vsethi, Mathieu Poirier

On Tue, May 10, 2022 at 12:07:42PM +0100, Sudeep Holla wrote:
> On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
> > Cc: Mike Williams, Mathieu Poirier
> > On 09/05/2022 10:28, Will Deacon wrote:
> > > On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
> > > >   arch/arm64/configs/defconfig                  |    1 +
> > > >   drivers/perf/Kconfig                          |    2 +
> > > >   drivers/perf/Makefile                         |    1 +
> > > >   drivers/perf/coresight_pmu/Kconfig            |   10 +
> > > >   drivers/perf/coresight_pmu/Makefile           |    7 +
> > > >   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317 +++++++++++++++++
> > > >   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
> > > >   .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
> > > >   .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
> > > >   9 files changed, 1802 insertions(+)
> > > 
> > > How does this interact with all the stuff we have under
> > > drivers/hwtracing/coresight/?
> > 
> > Absolutely zero, except for the name. The standard
> > is named "CoreSight PMU" which is a bit unfortunate,
> > given the only link, AFAIU, with the "CoreSight" architecture
> > is the Lock Access Register(LAR). For reference, the
> > drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
> > tracing and the PMU is called "cs_etm" (expands to coresight etm).
> > Otherwise the standard doesn't have anything to do with what
> > exists already in the kernel.

That's... a poor naming choice! But good, if it's entirely separate then I
don't have to worry about that. Just wanted to make sure we're not going to
get tangled up in things like ROM tables and Coresight power domains for
these things.

> > One potential recommendation for the name is, "Arm PMU"  (The ACPI table is
> > named Arm PMU Table). But then that could be clashing with the armv8_pmu
> > :-(.
> > 
> > Some of the other options are :
> > 
> > "Arm Generic PMU"
> > "Arm Uncore PMU"
> 
> I wasn't sure on this if there is any restriction on usage of this on Arm
> and hence didn't make the suggestion. But if allowed, this would be my
> choice too.

We'd taken to calling them "System" PMUS in the past, so maybe just stick
with that? I think "Uncore" is Intel terminology so it's probably best to
avoid it for non-Intel parts.

Will

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-10 11:13       ` Will Deacon
@ 2022-05-10 18:40         ` Sudeep Holla
  2022-05-11  1:29           ` Besar Wicaksono
  2022-05-11  8:44         ` Suzuki K Poulose
  1 sibling, 1 reply; 31+ messages in thread
From: Sudeep Holla @ 2022-05-10 18:40 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: Suzuki K Poulose, Will Deacon, Sudeep Holla, catalin.marinas,
	mark.rutland, linux-arm-kernel, linux-kernel, linux-tegra,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	Mathieu Poirier

On Tue, May 10, 2022 at 12:13:19PM +0100, Will Deacon wrote:
> On Tue, May 10, 2022 at 12:07:42PM +0100, Sudeep Holla wrote:
> > On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
> > > Cc: Mike Williams, Mathieu Poirier
> > > On 09/05/2022 10:28, Will Deacon wrote:
> > > > On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
> > > > >   arch/arm64/configs/defconfig                  |    1 +
> > > > >   drivers/perf/Kconfig                          |    2 +
> > > > >   drivers/perf/Makefile                         |    1 +
> > > > >   drivers/perf/coresight_pmu/Kconfig            |   10 +
> > > > >   drivers/perf/coresight_pmu/Makefile           |    7 +
> > > > >   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317 +++++++++++++++++
> > > > >   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
> > > > >   .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
> > > > >   .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
> > > > >   9 files changed, 1802 insertions(+)
> > > > 
> > > > How does this interact with all the stuff we have under
> > > > drivers/hwtracing/coresight/?
> > > 
> > > Absolutely zero, except for the name. The standard
> > > is named "CoreSight PMU" which is a bit unfortunate,
> > > given the only link, AFAIU, with the "CoreSight" architecture
> > > is the Lock Access Register(LAR). For reference, the
> > > drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
> > > tracing and the PMU is called "cs_etm" (expands to coresight etm).
> > > Otherwise the standard doesn't have anything to do with what
> > > exists already in the kernel.
>
> That's... a poor naming choice! But good, if it's entirely separate then I
> don't have to worry about that. Just wanted to make sure we're not going to
> get tangled up in things like ROM tables and Coresight power domains for
> these things.
>

OK, now that triggered another question/thought.

1. Do you need to do active power management for these PMUs ? Or like
   CPU PMUs, do you reject entering low power states if there is active
   session in progress. If there is active session, runtime PM won't get
   triggered but if there is system wide suspend, how is that dealt with ?

2. Assuming you need some sort of PM, and since this is static table(which
   I really don't like/prefer but it is out there 🙁), how do you plan to
   get the power domain related information.

-- 
Regards,
Sudeep

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-10 18:40         ` Sudeep Holla
@ 2022-05-11  1:29           ` Besar Wicaksono
  2022-05-11 12:42             ` Robin Murphy
  0 siblings, 1 reply; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-11  1:29 UTC (permalink / raw)
  To: Sudeep Holla
  Cc: Suzuki K Poulose, Will Deacon, catalin.marinas, mark.rutland,
	linux-arm-kernel, linux-kernel, linux-tegra, thanu.rangarajan,
	Michael.Williams, Thierry Reding, Jonathan Hunter, Vikram Sethi,
	Mathieu Poirier



> -----Original Message-----
> From: Sudeep Holla <sudeep.holla@arm.com>
> Sent: Tuesday, May 10, 2022 1:40 PM
> To: Besar Wicaksono <bwicaksono@nvidia.com>
> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>; Will Deacon
> <will@kernel.org>; Sudeep Holla <sudeep.holla@arm.com>;
> catalin.marinas@arm.com; mark.rutland@arm.com; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; thanu.rangarajan@arm.com;
> Michael.Williams@arm.com; Thierry Reding <treding@nvidia.com>; Jonathan
> Hunter <jonathanh@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>;
> Mathieu Poirier <mathieu.poirier@linaro.org>
> Subject: Re: [PATCH 0/2] perf: ARM CoreSight PMU support
> 
> External email: Use caution opening links or attachments
> 
> 
> On Tue, May 10, 2022 at 12:13:19PM +0100, Will Deacon wrote:
> > On Tue, May 10, 2022 at 12:07:42PM +0100, Sudeep Holla wrote:
> > > On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
> > > > Cc: Mike Williams, Mathieu Poirier
> > > > On 09/05/2022 10:28, Will Deacon wrote:
> > > > > On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
> > > > > >   arch/arm64/configs/defconfig                  |    1 +
> > > > > >   drivers/perf/Kconfig                          |    2 +
> > > > > >   drivers/perf/Makefile                         |    1 +
> > > > > >   drivers/perf/coresight_pmu/Kconfig            |   10 +
> > > > > >   drivers/perf/coresight_pmu/Makefile           |    7 +
> > > > > >   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317
> +++++++++++++++++
> > > > > >   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
> > > > > >   .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
> > > > > >   .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
> > > > > >   9 files changed, 1802 insertions(+)
> > > > >
> > > > > How does this interact with all the stuff we have under
> > > > > drivers/hwtracing/coresight/?
> > > >
> > > > Absolutely zero, except for the name. The standard
> > > > is named "CoreSight PMU" which is a bit unfortunate,
> > > > given the only link, AFAIU, with the "CoreSight" architecture
> > > > is the Lock Access Register(LAR). For reference, the
> > > > drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
> > > > tracing and the PMU is called "cs_etm" (expands to coresight etm).
> > > > Otherwise the standard doesn't have anything to do with what
> > > > exists already in the kernel.
> >
> > That's... a poor naming choice! But good, if it's entirely separate then I
> > don't have to worry about that. Just wanted to make sure we're not going
> to
> > get tangled up in things like ROM tables and Coresight power domains for
> > these things.
> >
> 
> OK, now that triggered another question/thought.
> 
> 1. Do you need to do active power management for these PMUs ? Or like
>    CPU PMUs, do you reject entering low power states if there is active
>    session in progress. If there is active session, runtime PM won't get
>    triggered but if there is system wide suspend, how is that dealt with ?
> 

Looking at the other uncore/system PMUs, none of the drivers support PM ops.
NVIDIA system PMU also does not get power gated and system suspend is not
supported. But just like other uncore PMU driver, this driver supports CPU hotplug.
If PM is needed, the required info should have been expressed in ACPI.

> 2. Assuming you need some sort of PM, and since this is static table(which
>    I really don't like/prefer but it is out there 🙁), how do you plan to
>    get the power domain related information.
> 

I guess the APMT spec in section 2.2 may cover this. If a PMU implementation has
properties beyond what is defined in the spec, these properties can be described in DSDT.
The driver doesn’t take care of this currently, so this is a room for future improvement.

Regards,
Besar

> --
> Regards,
> Sudeep

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-09 12:13   ` Robin Murphy
@ 2022-05-11  2:46     ` Besar Wicaksono
  2022-05-11 10:03       ` Robin Murphy
  0 siblings, 1 reply; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-11  2:46 UTC (permalink / raw)
  To: Robin Murphy
  Cc: catalin.marinas, will, mark.rutland, linux-arm-kernel,
	linux-kernel, linux-tegra, sudeep.holla, thanu.rangarajan,
	Michael.Williams, suzuki.poulose, Thierry Reding,
	Jonathan Hunter, Vikram Sethi



> -----Original Message-----
> From: Robin Murphy <robin.murphy@arm.com>
> Sent: Monday, May 9, 2022 7:13 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>; catalin.marinas@arm.com;
> will@kernel.org; mark.rutland@arm.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com;
> suzuki.poulose@arm.com; Thierry Reding <treding@nvidia.com>; Jonathan
> Hunter <jonathanh@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>
> Subject: Re: [PATCH 1/2] perf: coresight_pmu: Add support for ARM
> CoreSight PMU driver
> 
> External email: Use caution opening links or attachments
> 
> 
> On 2022-05-09 01:28, Besar Wicaksono wrote:
> > Add support for ARM CoreSight PMU driver framework and interfaces.
> > The driver provides generic implementation to operate uncore PMU based
> > on ARM CoreSight PMU architecture. The driver also provides interface
> > to get vendor/implementation specific information, for example event
> > attributes and formating.
> >
> > The specification used in this implementation can be found below:
> >   * ACPI Arm Performance Monitoring Unit table:
> >          https://developer.arm.com/documentation/den0117/latest
> >   * ARM Coresight PMU architecture:
> >          https://developer.arm.com/documentation/ihi0091/latest
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> > ---
> >   arch/arm64/configs/defconfig                  |    1 +
> >   drivers/perf/Kconfig                          |    2 +
> >   drivers/perf/Makefile                         |    1 +
> >   drivers/perf/coresight_pmu/Kconfig            |   10 +
> >   drivers/perf/coresight_pmu/Makefile           |    6 +
> >   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1315
> +++++++++++++++++
> >   .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
> >   7 files changed, 1482 insertions(+)
> >   create mode 100644 drivers/perf/coresight_pmu/Kconfig
> >   create mode 100644 drivers/perf/coresight_pmu/Makefile
> >   create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
> >   create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
> >
> > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > index 2ca8b1b336d2..8f2120182b25 100644
> > --- a/arch/arm64/configs/defconfig
> > +++ b/arch/arm64/configs/defconfig
> > @@ -1196,6 +1196,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
> >   CONFIG_PHY_TEGRA_XUSB=y
> >   CONFIG_PHY_AM654_SERDES=m
> >   CONFIG_PHY_J721E_WIZ=m
> > +CONFIG_ARM_CORESIGHT_PMU=y
> >   CONFIG_ARM_SMMU_V3_PMU=m
> >   CONFIG_FSL_IMX8_DDR_PMU=m
> >   CONFIG_QCOM_L2_PMU=y
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 1e2d69453771..c4e7cd5b4162 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
> >         Enable perf support for Marvell DDR Performance monitoring
> >         event on CN10K platform.
> >
> > +source "drivers/perf/coresight_pmu/Kconfig"
> > +
> >   endmenu
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index 57a279c61df5..4126a04b5583 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) +=
> arm_dmc620_pmu.o
> >   obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) +=
> marvell_cn10k_tad_pmu.o
> >   obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) +=
> marvell_cn10k_ddr_pmu.o
> >   obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
> > +obj-$(CONFIG_ARM_CORESIGHT_PMU) += coresight_pmu/
> > diff --git a/drivers/perf/coresight_pmu/Kconfig
> b/drivers/perf/coresight_pmu/Kconfig
> > new file mode 100644
> > index 000000000000..487dfee71ad1
> > --- /dev/null
> > +++ b/drivers/perf/coresight_pmu/Kconfig
> > @@ -0,0 +1,10 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > +
> > +config ARM_CORESIGHT_PMU
> > +     tristate "ARM Coresight PMU"
> > +     depends on ARM64 && ACPI_APMT
> 
> There shouldn't be any functional dependency on any CPU architecture here.

The spec is targeted towards ARM based system, shouldn't we explicitly limit it to ARM?

> 
> > +     help
> > +       Provides support for Performance Monitoring Unit (PMU) events
> based on
> > +       ARM CoreSight PMU architecture.
> > \ No newline at end of file
> > diff --git a/drivers/perf/coresight_pmu/Makefile
> b/drivers/perf/coresight_pmu/Makefile
> > new file mode 100644
> > index 000000000000..a2a7a5fbbc16
> > --- /dev/null
> > +++ b/drivers/perf/coresight_pmu/Makefile
> > @@ -0,0 +1,6 @@
> > +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > +#
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
> > +     arm_coresight_pmu.o
> > diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> > new file mode 100644
> > index 000000000000..1e9553d29717
> > --- /dev/null
> > +++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> > @@ -0,0 +1,1315 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * ARM CoreSight PMU driver.
> > + *
> > + * This driver adds support for uncore PMU based on ARM CoreSight
> Performance
> > + * Monitoring Unit Architecture. The PMU is accessible via MMIO registers
> and
> > + * like other uncore PMUs, it does not support process specific events and
> > + * cannot be used in sampling mode.
> > + *
> > + * This code is based on other uncore PMUs like ARM DSU PMU. It
> provides a
> > + * generic implementation to operate the PMU according to CoreSight
> PMU
> > + * architecture and ACPI ARM PMU table (APMT) documents below:
> > + *   - ARM CoreSight PMU architecture document number: ARM IHI 0091
> A.a-00bet0.
> > + *   - APMT document number: ARM DEN0117.
> > + * The description of the PMU, like the PMU device identification,
> available
> > + * events, and configuration options, is vendor specific. The driver
> provides
> > + * interface for vendor specific code to get this information. This allows
> the
> > + * driver to be shared with PMU from different vendors.
> > + *
> > + * CoreSight PMU devices are named as arm_coresight_pmu<node_id>
> where <node_id>
> > + * is APMT node id. The description of the device, like the identifier,
> > + * supported events, and formats can be found in sysfs
> > + * /sys/bus/event_source/devices/arm_coresight_pmu<node_id>
> > + *
> > + * The user should refer to the vendor technical documentation to get
> details
> > + * about the supported events.
> > + *
> > + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > + *
> > + */
> > +
> > +#include <linux/acpi.h>
> > +#include <linux/cacheinfo.h>
> > +#include <linux/ctype.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/module.h>
> > +#include <linux/perf_event.h>
> > +#include <linux/platform_device.h>
> > +#include <acpi/processor.h>
> > +
> > +#include "arm_coresight_pmu.h"
> > +
> > +#define PMUNAME "arm_coresight_pmu"
> > +
> > +#define CORESIGHT_CPUMASK_ATTR(_name, _config)                               \
> > +     CORESIGHT_EXT_ATTR(_name, coresight_pmu_cpumask_show,           \
> > +                        (unsigned long)_config)
> > +
> > +/**
> > + * Register offsets based on CoreSight Performance Monitoring Unit
> Architecture
> > + * Document number: ARM-ECM-0640169 00alp6
> > + */
> > +#define PMEVCNTR_LO                                  0x0
> > +#define PMEVCNTR_HI                                  0x4
> > +#define PMEVTYPER                                    0x400
> > +#define PMCCFILTR                                    0x47C
> > +#define PMEVFILTR                                    0xA00
> > +#define PMCNTENSET                                   0xC00
> > +#define PMCNTENCLR                                   0xC20
> > +#define PMINTENSET                                   0xC40
> > +#define PMINTENCLR                                   0xC60
> > +#define PMOVSCLR                                     0xC80
> > +#define PMOVSSET                                     0xCC0
> > +#define PMCFGR                                               0xE00
> > +#define PMCR                                         0xE04
> > +#define PMIIDR                                               0xE08
> > +
> > +/* PMCFGR register field */
> > +#define PMCFGR_NCG_SHIFT                             28
> > +#define PMCFGR_NCG_MASK                                      0xf
> > +#define PMCFGR_HDBG                                  BIT(24)
> > +#define PMCFGR_TRO                                   BIT(23)
> > +#define PMCFGR_SS                                    BIT(22)
> > +#define PMCFGR_FZO                                   BIT(21)
> > +#define PMCFGR_MSI                                   BIT(20)
> > +#define PMCFGR_UEN                                   BIT(19)
> > +#define PMCFGR_NA                                    BIT(17)
> > +#define PMCFGR_EX                                    BIT(16)
> > +#define PMCFGR_CCD                                   BIT(15)
> > +#define PMCFGR_CC                                    BIT(14)
> > +#define PMCFGR_SIZE_SHIFT                            8
> > +#define PMCFGR_SIZE_MASK                             0x3f
> > +#define PMCFGR_N_SHIFT                                       0
> > +#define PMCFGR_N_MASK                                        0xff
> > +
> > +/* PMCR register field */
> > +#define PMCR_TRO                                     BIT(11)
> > +#define PMCR_HDBG                                    BIT(10)
> > +#define PMCR_FZO                                     BIT(9)
> > +#define PMCR_NA                                              BIT(8)
> > +#define PMCR_DP                                              BIT(5)
> > +#define PMCR_X                                               BIT(4)
> > +#define PMCR_D                                               BIT(3)
> > +#define PMCR_C                                               BIT(2)
> > +#define PMCR_P                                               BIT(1)
> > +#define PMCR_E                                               BIT(0)
> > +
> > +/* PMIIDR register field */
> > +#define PMIIDR_IMPLEMENTER_MASK                              0xFFF
> > +#define PMIIDR_PRODUCTID_MASK                                0xFFF
> > +#define PMIIDR_PRODUCTID_SHIFT                               20
> > +
> > +/* Each SET/CLR register supports up to 32 counters. */
> > +#define CORESIGHT_SET_CLR_REG_COUNTER_NUM            32
> > +#define CORESIGHT_SET_CLR_REG_COUNTER_SHIFT          5
> > +
> > +/* The number of 32-bit SET/CLR register that can be supported. */
> > +#define CORESIGHT_SET_CLR_REG_MAX_NUM ((PMCNTENCLR -
> PMCNTENSET) / sizeof(u32))
> > +
> > +static_assert((CORESIGHT_SET_CLR_REG_MAX_NUM *
> > +            CORESIGHT_SET_CLR_REG_COUNTER_NUM) >=
> > +           CORESIGHT_PMU_MAX_HW_CNTRS);
> > +
> > +/* Convert counter idx into SET/CLR register number. */
> > +#define CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx)                         \
> > +     (idx >> CORESIGHT_SET_CLR_REG_COUNTER_SHIFT)
> > +
> > +/* Convert counter idx into SET/CLR register bit. */
> > +#define CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx)                                \
> > +     (idx & (CORESIGHT_SET_CLR_REG_COUNTER_NUM - 1))
> > +
> > +#define CORESIGHT_ACTIVE_CPU_MASK                    0x0
> > +#define CORESIGHT_ASSOCIATED_CPU_MASK                        0x1
> > +
> > +#define CORESIGHT_EVENT_MASK                         0xFFFFFFFFULL
> > +#define CORESIGHT_FILTER_MASK                                0xFFFFFFFFULL
> > +#define CORESIGHT_FILTER_SHIFT                               32ULL
> > +
> > +/* Check if field f in flags is set with value v */
> > +#define CHECK_APMT_FLAG(flags, f, v) \
> > +     ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ##
> _ ## v))
> > +
> > +static unsigned long coresight_pmu_cpuhp_state;
> > +
> > +/*
> > + * In CoreSight PMU architecture, all of the MMIO registers are 32-bit
> except
> > + * counter register. The counter register can be implemented as 32-bit or
> 64-bit
> > + * register depending on the value of PMCFGR.SIZE field. For 64-bit
> access,
> > + * single-copy 64-bit atomic support is implementation defined. APMT
> node flag
> > + * is used to identify if the PMU supports 64-bit single copy atomic. If 64-
> bit
> > + * single copy atomic is not supported, the driver treats the register as a
> pair
> > + * of 32-bit register.
> > + */
> > +
> > +/*
> > + * Read 32-bit register.
> > + *
> > + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> > + * @offset  : register offset.
> > + *
> > + * @return 32-bit value of the register.
> > + */
> > +static inline u32 read_reg32(void __iomem *base, u32 offset)
> > +{
> > +     return readl(base + offset);
> > +}
> 
> read_reg32(x, y);
> readl(x + y);
> 
> These kind of wrappers are just about reasonable when they encapsulate a
> structure dereference or some computation to transform the offset, but
> having 13 extra lines plus 4 extra characters per callsite purely to
> obfuscate an addition seems objectively worse than not doing that.

Sure, we will replace these calls.

> 
> > +
> > +/*
> > + * Read 64-bit register using single 64-bit atomic copy.
> > + *
> > + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> > + * @offset  : register offset.
> > + *
> > + * @return 64-bit value of the register.
> > + */
> > +static u64 read_reg64(void __iomem *base, u32 offset)
> > +{
> > +     return readq(base + offset);
> > +}
> > +
> > +/*
> > + * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
> > + *
> > + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> > + * @offset  : register offset.
> > + *
> > + * @return 64-bit value of the register pair.
> > + */
> > +static u64 read_reg64_hilohi(void __iomem *base, u32 offset)
> > +{
> > +     u32 val_lo, val_hi;
> > +     u64 val;
> > +
> > +     /* Use high-low-high sequence to avoid tearing */
> > +     do {
> > +             val_hi = read_reg32(base, offset + 4);
> > +             val_lo = read_reg32(base, offset);
> > +     } while (val_hi != read_reg32(base, offset + 4));
> > +
> > +     val = (((u64)val_hi << 32) | val_lo);
> > +
> > +     return val;
> > +}
> > +
> > +/*
> > + * Write to 32-bit register.
> > + *
> > + * @val     : 32-bit value to write.
> > + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> > + * @offset  : register offset.
> > + *
> > + */
> > +static inline void write_reg32(u32 val, void __iomem *base, u32 offset)
> > +{
> > +     writel(val, base + offset);
> > +}
> > +
> > +/*
> > + * Write to 64-bit register using single 64-bit atomic copy.
> > + *
> > + * @val     : 64-bit value to write.
> > + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> > + * @offset  : register offset.
> > + *
> > + */
> > +static void write_reg64(u64 val, void __iomem *base, u32 offset)
> > +{
> > +     writeq(val, base + offset);
> > +}
> > +
> > +/*
> > + * Write to 64-bit register as a pair of 32-bit registers.
> > + *
> > + * @val     : 64-bit value to write.
> > + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
> > + * @offset  : register offset.
> > + *
> > + */
> > +static void write_reg64_lohi(u64 val, void __iomem *base, u32 offset)
> > +{
> > +     u32 val_lo, val_hi;
> > +
> > +     val_hi = upper_32_bits(val);
> > +     val_lo = lower_32_bits(val);
> > +
> > +     write_reg32(val_lo, base, offset);
> > +     write_reg32(val_hi, base, offset + 4);
> > +}
> 
> #include <linux/io-64-nonatomic-lo-hi.h>

Thanks for pointing this out. We will replace it with lo_hi_writeq.

> 
> > +
> > +/* Check if cycle counter is supported. */
> > +static inline bool support_cc(const struct coresight_pmu *coresight_pmu)
> > +{
> > +     return (coresight_pmu->pmcfgr & PMCFGR_CC);
> > +}
> > +
> > +/* Get counter size. */
> > +static inline u32 pmcfgr_size(const struct coresight_pmu *coresight_pmu)
> > +{
> > +     return (coresight_pmu->pmcfgr >> PMCFGR_SIZE_SHIFT) &
> PMCFGR_SIZE_MASK;
> > +}
> > +
> > +/* Check if counter is implemented as 64-bit register. */
> > +static inline bool
> > +use_64b_counter_reg(const struct coresight_pmu *coresight_pmu)
> > +{
> > +     return (pmcfgr_size(coresight_pmu) > 31);
> > +}
> > +
> > +/* Get number of counters, minus one. */
> > +static inline u32 pmcfgr_n(const struct coresight_pmu *coresight_pmu)
> > +{
> > +     return (coresight_pmu->pmcfgr >> PMCFGR_N_SHIFT) &
> PMCFGR_N_MASK;
> > +}
> > +
> > +ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
> > +                                    struct device_attribute *attr, char *buf)
> > +{
> > +     struct dev_ext_attribute *eattr =
> > +             container_of(attr, struct dev_ext_attribute, attr);
> > +     return sysfs_emit(buf, "event=0x%llx\n",
> > +                       (unsigned long long)eattr->var);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_event_show);
> > +
> > +/**
> > + * Event list of PMU that does not support cycle counter. Currently the
> > + * CoreSight PMU spec does not define standard events, so it is empty
> now.
> > + */
> > +static struct attribute *coresight_pmu_event_attrs[] = {
> > +     NULL,
> > +};
> > +
> > +/* Event list of PMU supporting cycle counter. */
> > +static struct attribute *coresight_pmu_event_attrs_cc[] = {
> > +     CORESIGHT_EVENT_ATTR(cycles,
> CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
> > +     NULL,
> > +};
> > +
> > +struct attribute **
> > +coresight_pmu_get_event_attrs(const struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     return (support_cc(coresight_pmu)) ? coresight_pmu_event_attrs_cc :
> > +                                          coresight_pmu_event_attrs;
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_pmu_get_event_attrs);
> 
> If cycle count is a standard but optional event, just include it in the
> stndard event attrs and use .is_visible to filter it our when not
> present. No need for this overcomplicated machinery.
> 

Sure, thanks for pointing that out.

> > +
> > +ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
> > +                                     struct device_attribute *attr,
> > +                                     char *buf)
> > +{
> > +     struct dev_ext_attribute *eattr =
> > +             container_of(attr, struct dev_ext_attribute, attr);
> > +     return sysfs_emit(buf, "%s\n", (char *)eattr->var);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_format_show);
> > +
> > +static struct attribute *coresight_pmu_format_attrs[] = {
> > +     CORESIGHT_FORMAT_ATTR(event, "config:0-31"),
> > +     CORESIGHT_FORMAT_ATTR(filter, "config:32-63"),
> > +     NULL,
> > +};
> > +
> > +struct attribute **
> > +coresight_pmu_get_format_attrs(const struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     return coresight_pmu_format_attrs;
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_pmu_get_format_attrs);
> > +
> > +u32 coresight_pmu_event_type(const struct perf_event *event)
> > +{
> > +     return event->attr.config & CORESIGHT_EVENT_MASK;
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_pmu_event_type);
> > +
> > +u32 coresight_pmu_event_filter(const struct perf_event *event)
> > +{
> > +     return (event->attr.config >> CORESIGHT_FILTER_SHIFT) &
> > +            CORESIGHT_FILTER_MASK;
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_pmu_event_filter);
> > +
> > +static ssize_t coresight_pmu_identifier_show(struct device *dev,
> > +                                          struct device_attribute *attr,
> > +                                          char *page)
> > +{
> > +     struct coresight_pmu *coresight_pmu =
> > +             to_coresight_pmu(dev_get_drvdata(dev));
> > +
> > +     return sysfs_emit(page, "%s\n", coresight_pmu->identifier);
> > +}
> > +
> > +static struct device_attribute coresight_pmu_identifier_attr =
> > +     __ATTR(identifier, 0444, coresight_pmu_identifier_show, NULL);
> > +
> > +static struct attribute *coresight_pmu_identifier_attrs[] = {
> > +     &coresight_pmu_identifier_attr.attr,
> > +     NULL,
> > +};
> > +
> > +static struct attribute_group coresight_pmu_identifier_attr_group = {
> > +     .attrs = coresight_pmu_identifier_attrs,
> > +};
> > +
> > +const char *
> > +coresight_pmu_get_identifier(const struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     const char *identifier =
> > +             devm_kasprintf(coresight_pmu->dev, GFP_KERNEL, "%x",
> > +                            coresight_pmu->impl.pmiidr);
> > +     return identifier;
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_pmu_get_identifier);
> > +
> > +static ssize_t coresight_pmu_cpumask_show(struct device *dev,
> > +                                       struct device_attribute *attr,
> > +                                       char *buf)
> > +{
> > +     struct pmu *pmu = dev_get_drvdata(dev);
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
> > +     struct dev_ext_attribute *eattr =
> > +             container_of(attr, struct dev_ext_attribute, attr);
> > +     unsigned long mask_id = (unsigned long)eattr->var;
> > +     const cpumask_t *cpumask;
> > +
> > +     switch (mask_id) {
> > +     case CORESIGHT_ACTIVE_CPU_MASK:
> > +             cpumask = &coresight_pmu->active_cpu;
> > +             break;
> > +     case CORESIGHT_ASSOCIATED_CPU_MASK:
> > +             cpumask = &coresight_pmu->associated_cpus;
> > +             break;
> > +     default:
> > +             return 0;
> > +     }
> > +     return cpumap_print_to_pagebuf(true, buf, cpumask);
> > +}
> > +
> > +static struct attribute *coresight_pmu_cpumask_attrs[] = {
> > +     CORESIGHT_CPUMASK_ATTR(cpumask,
> CORESIGHT_ACTIVE_CPU_MASK),
> > +     CORESIGHT_CPUMASK_ATTR(associated_cpus,
> CORESIGHT_ASSOCIATED_CPU_MASK),
> > +     NULL,
> > +};
> > +
> > +static struct attribute_group coresight_pmu_cpumask_attr_group = {
> > +     .attrs = coresight_pmu_cpumask_attrs,
> > +};
> > +
> > +static const struct coresight_pmu_impl_ops default_impl_ops = {
> > +     .get_event_attrs        = coresight_pmu_get_event_attrs,
> > +     .get_format_attrs       = coresight_pmu_get_format_attrs,
> > +     .get_identifier         = coresight_pmu_get_identifier,
> > +     .is_cc_event            = coresight_pmu_is_cc_event,
> > +     .event_type             = coresight_pmu_event_type,
> > +     .event_filter           = coresight_pmu_event_filter
> > +};
> > +
> > +struct impl_match {
> > +     u32 jedec_jep106_id;
> > +     int (*impl_init_ops)(struct coresight_pmu *coresight_pmu);
> > +};
> > +
> > +static const struct impl_match impl_match[] = {
> > +     {}
> > +};
> > +
> > +static int coresight_pmu_init_impl_ops(struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     int idx, ret;
> > +     u32 jedec_id;
> > +     struct acpi_apmt_node *apmt_node = coresight_pmu->apmt_node;
> > +     const struct impl_match *match = impl_match;
> > +
> > +     /*
> > +      * Get PMU implementer and product id from APMT node.
> > +      * If APMT node doesn't have implementer/product id, try get it
> > +      * from PMIIDR.
> > +      */
> > +     coresight_pmu->impl.pmiidr =
> > +             (apmt_node->impl_id) ? apmt_node->impl_id :
> > +                                    read_reg32(coresight_pmu->base0, PMIIDR);
> 
> The spec says the opposite, that the APMT field should be ignored if
> PMIIDR or PMPIDR is present.
> 

This is to cover case where pmiidr value is incorrect due to mistake in HW.
Hopefully, we can get an update to the spec soon.

> > +     jedec_id = coresight_pmu->impl.pmiidr &
> PMIIDR_IMPLEMENTER_MASK;
> > +
> > +     /* Find implementer specific attribute ops. */
> > +     for (idx = 0; match->jedec_jep106_id; match++, idx++) {
> > +             if (match->jedec_jep106_id == jedec_id) {
> 
> I reckon we could simply have (value,mask) pairs in impl_match to
> directly match the whole IIDR value to some implementation ops, and save
> some bother here. It could always be refactored if and when a sufficient
> number of different implementations exist to make that approach unwieldy.

Sure, that's reasonable.

> 
> > +                     ret = match->impl_init_ops(coresight_pmu);
> > +                     if (ret)
> > +                             return ret;
> > +
> > +                     return 0;
> > +             }
> > +     }
> > +
> > +     /* We don't find implementer specific attribute ops, use default. */
> > +     coresight_pmu->impl.ops = &default_impl_ops;
> > +     return 0;
> > +}
> > +
> > +static struct attribute_group *
> > +coresight_pmu_alloc_event_attr_group(struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     struct attribute_group *event_group;
> > +     struct device *dev = coresight_pmu->dev;
> > +
> > +     event_group =
> > +             devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> > +     if (!event_group)
> > +             return NULL;
> > +
> > +     event_group->name = "events";
> > +     event_group->attrs =
> > +             coresight_pmu->impl.ops->get_event_attrs(coresight_pmu);
> > +
> > +     return event_group;
> > +}
> > +
> > +static struct attribute_group *
> > +coresight_pmu_alloc_format_attr_group(struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     struct attribute_group *format_group;
> > +     struct device *dev = coresight_pmu->dev;
> > +
> > +     format_group =
> > +             devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> > +     if (!format_group)
> > +             return NULL;
> > +
> > +     format_group->name = "format";
> > +     format_group->attrs =
> > +             coresight_pmu->impl.ops->get_format_attrs(coresight_pmu);
> > +
> > +     return format_group;
> > +}
> > +
> > +static struct attribute_group **
> > +coresight_pmu_alloc_attr_group(struct coresight_pmu *coresight_pmu)
> > +{
> > +     const struct coresight_pmu_impl_ops *impl_ops;
> > +     struct attribute_group **attr_groups = NULL;
> > +     struct device *dev = coresight_pmu->dev;
> > +     int ret;
> > +
> > +     ret = coresight_pmu_init_impl_ops(coresight_pmu);
> > +     if (ret)
> > +             return NULL;
> > +
> > +     impl_ops = coresight_pmu->impl.ops;
> > +
> > +     coresight_pmu->identifier = impl_ops->get_identifier(coresight_pmu);
> > +
> > +     attr_groups = devm_kzalloc(dev, 5 * sizeof(struct attribute_group *),
> > +                                GFP_KERNEL);
> > +     if (!attr_groups)
> > +             return NULL;
> > +
> > +     attr_groups[0] =
> coresight_pmu_alloc_event_attr_group(coresight_pmu);
> > +     attr_groups[1] =
> coresight_pmu_alloc_format_attr_group(coresight_pmu);
> > +     attr_groups[2] = &coresight_pmu_identifier_attr_group;
> > +     attr_groups[3] = &coresight_pmu_cpumask_attr_group;
> > +
> > +     return attr_groups;
> > +}
> > +
> > +static inline void
> > +coresight_pmu_start_counters(struct coresight_pmu *coresight_pmu)
> > +{
> > +     u32 pmcr;
> > +
> > +     pmcr = read_reg32(coresight_pmu->base0, PMCR);
> > +     pmcr |= PMCR_E;
> > +     write_reg32(pmcr, coresight_pmu->base0, PMCR);
> > +}
> > +
> > +static inline void
> > +coresight_pmu_stop_counters(struct coresight_pmu *coresight_pmu)
> > +{
> > +     u32 pmcr;
> > +
> > +     pmcr = read_reg32(coresight_pmu->base0, PMCR);
> > +     pmcr &= ~PMCR_E;
> > +     write_reg32(pmcr, coresight_pmu->base0, PMCR); > +}
> 
> I'm inclined to think these shouldn't be read-modify-write
> implementations. Arguably the driver should reset the control register
> to a known state initially, so from then on it can simply write new
> values based oin what it knows it's changing.
> 
> AFAICS from the spec only the PMCR.E bit has a defined reset value, so
> preserving random values in other bits like FZO and D is sure to be fun.
> 

Thanks, we will reset PMCR during driver init.

> > +
> > +static void coresight_pmu_enable(struct pmu *pmu)
> > +{
> > +     int enabled;
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
> > +
> > +     enabled = bitmap_weight(coresight_pmu->hw_events.used_ctrs,
> > +                             CORESIGHT_PMU_MAX_HW_CNTRS);
> > +
> > +     if (!enabled)
> > +             return;
> 
> Use bitmap_empty() for checking if a bitmap is empty.

Sure, we can use this.

> 
> > +
> > +     coresight_pmu_start_counters(coresight_pmu);
> > +}
> > +
> > +static void coresight_pmu_disable(struct pmu *pmu)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
> > +
> > +     coresight_pmu_stop_counters(coresight_pmu);
> > +}
> > +
> > +static inline bool is_cycle_cntr_idx(const struct perf_event *event)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +     int idx = event->hw.idx;
> > +
> > +     return (support_cc(coresight_pmu) && idx ==
> CORESIGHT_PMU_IDX_CCNTR);
> 
> If we don't support cycle counting, cycles count events should have been
> rejected in event_init. If they're able to propagate further than that
> 

Not sure I understand, do you mean the check for cycle counter support is unnecessary ?
This function is actually called by coresight_pmu_start, which is after event_init had passed.
coresight_pmu_start is not aware if cycle counter is supported or not, so we need to keep checking it.

> > +}
> > +
> > +bool coresight_pmu_is_cc_event(const struct perf_event *event)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +     u32 evtype = coresight_pmu->impl.ops->event_type(event);
> > +
> > +     return (support_cc(coresight_pmu) &&
> 
> Ditto.

This function is called by event_init to validate the event and find available counters.

> 
> > +             evtype == CORESIGHT_PMU_EVT_CYCLES_DEFAULT);
> > +}
> > +EXPORT_SYMBOL_GPL(coresight_pmu_is_cc_event);
> > +
> > +static int
> > +coresight_pmu_get_event_idx(struct coresight_pmu_hw_events
> *hw_events,
> > +                         struct perf_event *event)
> > +{
> > +     int idx, reserve_cc;
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +
> > +     if (coresight_pmu->impl.ops->is_cc_event(event)) {
> > +             /* Search for available cycle counter. */
> > +             if (test_and_set_bit(CORESIGHT_PMU_IDX_CCNTR,
> > +                                  hw_events->used_ctrs))
> > +                     return -EAGAIN;
> > +
> > +             return CORESIGHT_PMU_IDX_CCNTR;
> > +     }
> > +
> > +     /*
> > +      * CoreSight PMU can support up to 256 counters. The cycle counter is
> > +      * always on counter[31]. To prevent regular event from using cycle
> > +      * counter, we reserve the cycle counter bit temporarily.
> > +      */
> > +     reserve_cc = 0;
> > +     if (support_cc(coresight_pmu) &&
> > +         coresight_pmu->num_adj_counters >=
> CORESIGHT_PMU_IDX_CCNTR)
> > +             reserve_cc = (test_and_set_bit(CORESIGHT_PMU_IDX_CCNTR,
> > +                                            hw_events->used_ctrs) == 0);
> 
> It would seem a lot easier to reserve PMEVCNTR[31] permanently and track
> allocation of PMCCNTR with a separate flag, when appropriate.

The purpose was to avoid using two flags when keeping track of the used counters.
But you have valid concern on potential significant number of unused bits and compiler
warning due to the large stack size. We will revisit this.

> 
> > +
> > +     /* Search available regular counter from the used counter bitmap. */
> > +     idx = find_first_zero_bit(hw_events->used_ctrs,
> > +                               coresight_pmu->num_adj_counters);
> > +
> > +     /* Restore cycle counter bit. */
> > +     if (reserve_cc)
> > +             clear_bit(CORESIGHT_PMU_IDX_CCNTR, hw_events->used_ctrs);
> > +
> > +     if (idx >= coresight_pmu->num_adj_counters)
> > +             return -EAGAIN;
> > +
> > +     set_bit(idx, hw_events->used_ctrs);
> > +
> > +     return idx;
> > +}
> > +
> > +static bool
> > +coresight_pmu_validate_event(struct pmu *pmu,
> > +                          struct coresight_pmu_hw_events *hw_events,
> > +                          struct perf_event *event)
> > +{
> > +     if (is_software_event(event))
> > +             return true;
> > +
> > +     /* Reject groups spanning multiple HW PMUs. */
> > +     if (event->pmu != pmu)
> > +             return false;
> > +
> > +     return (coresight_pmu_get_event_idx(hw_events, event) >= 0);
> > +}
> > +
> > +/**
> > + * Make sure the group of events can be scheduled at once
> > + * on the PMU.
> > + */
> > +static bool coresight_pmu_validate_group(struct perf_event *event)
> > +{
> > +     struct perf_event *sibling, *leader = event->group_leader;
> > +     struct coresight_pmu_hw_events fake_hw_events;
> 
> Do you not get a compile-time warning about this?

Thanks for spotting this. I checked my build log and I can see the warning.
We will update it.

> 
> > +     if (event->group_leader == event)
> > +             return true;
> > +
> > +     memset(&fake_hw_events, 0, sizeof(fake_hw_events));
> > +
> > +     if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events,
> leader))
> > +             return false;
> > +
> > +     for_each_sibling_event(sibling, leader) {
> > +             if (!coresight_pmu_validate_event(event->pmu,
> &fake_hw_events,
> > +                                               sibling))
> > +                     return false;
> > +     }
> > +
> > +     return coresight_pmu_validate_event(event->pmu,
> &fake_hw_events, event);
> > +}
> > +
> > +static int coresight_pmu_event_init(struct perf_event *event)
> > +{
> > +     struct coresight_pmu *coresight_pmu;
> > +     struct hw_perf_event *hwc = &event->hw;
> > +
> > +     coresight_pmu = to_coresight_pmu(event->pmu);
> > +
> > +     /**
> 
> This isn't kerneldoc.

Sure, we will make the update.

> 
> > +      * Following other "uncore" PMUs, we do not support sampling mode
> or
> > +      * attach to a task (per-process mode).
> > +      */
> > +     if (is_sampling_event(event)) {
> > +             dev_dbg(coresight_pmu->pmu.dev,
> > +                     "Can't support sampling events\n");
> > +             return -EOPNOTSUPP;
> > +     }
> > +
> > +     if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
> > +             dev_dbg(coresight_pmu->pmu.dev,
> > +                     "Can't support per-task counters\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     /**
> 
> Ditto.
> 
> > +      * Make sure the CPU assignment is on one of the CPUs associated with
> > +      * this PMU.
> > +      */
> > +     if (!cpumask_test_cpu(event->cpu, &coresight_pmu-
> >associated_cpus)) {
> > +             dev_dbg(coresight_pmu->pmu.dev,
> > +                     "Requested cpu is not associated with the PMU\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     /* Enforce the current active CPU to handle the events in this PMU. */
> > +     event->cpu = cpumask_first(&coresight_pmu->active_cpu);
> > +     if (event->cpu >= nr_cpu_ids)
> > +             return -EINVAL;
> > +
> > +     if (!coresight_pmu_validate_group(event))
> > +             return -EINVAL;
> > +
> > +     /**
> 
> Ditto.
> 
> > +      * We don't assign an index until we actually place the event onto
> > +      * hardware. Use -1 to signify that we haven't decided where to put it
> > +      * yet.
> > +      */
> > +     hwc->idx = -1;
> > +     hwc->config_base = coresight_pmu->impl.ops->event_type(event);
> > +
> > +     return 0;
> > +}
> > +
> > +static inline u32 counter_offset(u32 reg_sz, u32 ctr_idx)
> > +{
> > +     return (PMEVCNTR_LO + (reg_sz * ctr_idx));
> > +}
> > +
> > +static void coresight_pmu_write_counter(struct perf_event *event, u64
> val)
> > +{
> > +     u32 offset;
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +
> > +     if (use_64b_counter_reg(coresight_pmu)) {
> > +             offset = counter_offset(sizeof(u64), event->hw.idx);
> > +
> > +             coresight_pmu->write_reg64(val, coresight_pmu->base1, offset);
> > +     } else {
> > +             offset = counter_offset(sizeof(u32), event->hw.idx);
> > +
> > +             write_reg32(lower_32_bits(val), coresight_pmu->base1, offset);
> > +     }
> > +}
> > +
> > +static u64 coresight_pmu_read_counter(struct perf_event *event)
> > +{
> > +     u32 offset;
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +
> > +     if (use_64b_counter_reg(coresight_pmu)) {
> > +             offset = counter_offset(sizeof(u64), event->hw.idx);
> > +             return coresight_pmu->read_reg64(coresight_pmu->base1,
> offset);
> > +     }
> > +
> > +     offset = counter_offset(sizeof(u32), event->hw.idx);
> > +     return read_reg32(coresight_pmu->base1, offset);
> > +}
> > +
> > +/**
> > + * coresight_pmu_set_event_period: Set the period for the counter.
> > + *
> > + * To handle cases of extreme interrupt latency, we program
> > + * the counter with half of the max count for the counters.
> > + */
> > +static void coresight_pmu_set_event_period(struct perf_event *event)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +     u64 val = GENMASK_ULL(pmcfgr_size(coresight_pmu), 0) >> 1;
> > +
> > +     local64_set(&event->hw.prev_count, val);
> > +     coresight_pmu_write_counter(event, val);
> > +}
> > +
> > +static void coresight_pmu_enable_counter(struct coresight_pmu
> *coresight_pmu,
> > +                                      int idx)
> > +{
> > +     u32 reg_id, reg_bit, inten_off, cnten_off;
> > +
> > +     reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
> > +     reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
> > +
> > +     inten_off = PMINTENSET + (4 * reg_id);
> > +     cnten_off = PMCNTENSET + (4 * reg_id);
> > +
> > +     write_reg32(BIT(reg_bit), coresight_pmu->base0, inten_off);
> > +     write_reg32(BIT(reg_bit), coresight_pmu->base0, cnten_off);
> > +}
> > +
> > +static void coresight_pmu_disable_counter(struct coresight_pmu
> *coresight_pmu,
> > +                                       int idx)
> > +{
> > +     u32 reg_id, reg_bit, inten_off, cnten_off;
> > +
> > +     reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
> > +     reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
> > +
> > +     inten_off = PMINTENCLR + (4 * reg_id);
> > +     cnten_off = PMCNTENCLR + (4 * reg_id);
> > +
> > +     write_reg32(BIT(reg_bit), coresight_pmu->base0, cnten_off);
> > +     write_reg32(BIT(reg_bit), coresight_pmu->base0, inten_off);
> > +}
> > +
> > +static void coresight_pmu_event_update(struct perf_event *event)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +     struct hw_perf_event *hwc = &event->hw;
> > +     u64 delta, prev, now;
> > +
> > +     do {
> > +             prev = local64_read(&hwc->prev_count);
> > +             now = coresight_pmu_read_counter(event);
> > +     } while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
> > +
> > +     delta = (now - prev) & GENMASK_ULL(pmcfgr_size(coresight_pmu), 0);
> > +     local64_add(delta, &event->count);
> > +}
> > +
> > +static inline void coresight_pmu_set_event(struct coresight_pmu
> *coresight_pmu,
> > +                                        struct hw_perf_event *hwc)
> > +{
> > +     u32 offset = PMEVTYPER + (4 * hwc->idx);
> > +
> > +     write_reg32(hwc->config_base, coresight_pmu->base0, offset);
> > +}
> > +
> > +static inline void
> > +coresight_pmu_set_ev_filter(struct coresight_pmu *coresight_pmu,
> > +                         struct hw_perf_event *hwc, u32 filter)
> > +{
> > +     u32 offset = PMEVFILTR + (4 * hwc->idx);
> > +
> > +     write_reg32(filter, coresight_pmu->base0, offset);
> > +}
> > +
> > +static inline void
> > +coresight_pmu_set_cc_filter(struct coresight_pmu *coresight_pmu, u32
> filter)
> > +{
> > +     u32 offset = PMCCFILTR;
> > +
> > +     write_reg32(filter, coresight_pmu->base0, offset);
> > +}
> > +
> > +static void coresight_pmu_start(struct perf_event *event, int pmu_flags)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +     struct hw_perf_event *hwc = &event->hw;
> > +     u32 filter;
> > +
> > +     /* We always reprogram the counter */
> > +     if (pmu_flags & PERF_EF_RELOAD)
> > +             WARN_ON(!(hwc->state & PERF_HES_UPTODATE));
> > +
> > +     coresight_pmu_set_event_period(event);
> > +
> > +     filter = coresight_pmu->impl.ops->event_filter(event);
> > +
> > +     if (is_cycle_cntr_idx(event)) {
> > +             coresight_pmu_set_cc_filter(coresight_pmu, filter);
> > +     } else {
> > +             coresight_pmu_set_event(coresight_pmu, hwc);
> > +             coresight_pmu_set_ev_filter(coresight_pmu, hwc, filter);
> > +     }
> > +
> > +     hwc->state = 0;
> > +
> > +     coresight_pmu_enable_counter(coresight_pmu, hwc->idx);
> > +}
> > +
> > +static void coresight_pmu_stop(struct perf_event *event, int pmu_flags)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +     struct hw_perf_event *hwc = &event->hw;
> > +
> > +     if (hwc->state & PERF_HES_STOPPED)
> > +             return;
> > +
> > +     coresight_pmu_disable_counter(coresight_pmu, hwc->idx);
> > +     coresight_pmu_event_update(event);
> > +
> > +     hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
> > +}
> > +
> > +static int coresight_pmu_add(struct perf_event *event, int flags)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +     struct coresight_pmu_hw_events *hw_events = &coresight_pmu-
> >hw_events;
> > +     struct hw_perf_event *hwc = &event->hw;
> > +     int idx;
> > +
> > +     if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> > +                                        &coresight_pmu->associated_cpus)))
> > +             return -ENOENT;
> > +
> > +     idx = coresight_pmu_get_event_idx(hw_events, event);
> > +     if (idx < 0)
> > +             return idx;
> > +
> > +     hw_events->events[idx] = event;
> > +     hwc->idx = idx;
> > +     hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> > +
> > +     if (flags & PERF_EF_START)
> > +             coresight_pmu_start(event, PERF_EF_RELOAD);
> > +
> > +     /* Propagate changes to the userspace mapping. */
> > +     perf_event_update_userpage(event);
> > +
> > +     return 0;
> > +}
> > +
> > +static void coresight_pmu_del(struct perf_event *event, int flags)
> > +{
> > +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
> >pmu);
> > +     struct coresight_pmu_hw_events *hw_events = &coresight_pmu-
> >hw_events;
> > +     struct hw_perf_event *hwc = &event->hw;
> > +     int idx = hwc->idx;
> > +
> > +     coresight_pmu_stop(event, PERF_EF_UPDATE);
> > +
> > +     hw_events->events[idx] = NULL;
> > +
> > +     clear_bit(idx, hw_events->used_ctrs);
> > +
> > +     perf_event_update_userpage(event);
> > +}
> > +
> > +static void coresight_pmu_read(struct perf_event *event)
> > +{
> > +     coresight_pmu_event_update(event);
> > +}
> > +
> > +static int coresight_pmu_alloc(struct platform_device *pdev,
> > +                            struct coresight_pmu **coresight_pmu)
> > +{
> > +     struct acpi_apmt_node *apmt_node;
> > +     struct device *dev;
> > +     struct coresight_pmu *pmu;
> > +
> > +     dev = &pdev->dev;
> > +     apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
> > +     if (!apmt_node) {
> > +             dev_err(dev, "failed to get APMT node\n");
> > +             return -ENOMEM;
> > +     }
> > +
> > +     pmu = devm_kzalloc(dev, sizeof(*pmu), GFP_KERNEL);
> > +     if (!pmu)
> > +             return -ENOMEM;
> > +
> > +     *coresight_pmu = pmu;
> > +
> > +     pmu->dev = dev;
> > +     pmu->apmt_node = apmt_node;
> > +     pmu->name =
> > +             devm_kasprintf(dev, GFP_KERNEL, PMUNAME "%u", apmt_node-
> >id);
> > +
> > +     platform_set_drvdata(pdev, coresight_pmu);
> > +
> > +     return 0;
> > +}
> > +
> > +static int coresight_pmu_init_mmio(struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     struct device *dev;
> > +     struct platform_device *pdev;
> > +     struct resource *res;
> > +     struct acpi_apmt_node *apmt_node;
> > +
> > +     dev = coresight_pmu->dev;
> > +     pdev = to_platform_device(dev);
> > +     apmt_node = coresight_pmu->apmt_node;
> > +
> > +     /* Base address for page 0. */
> > +     res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
> > +     if (!res) {
> > +             dev_err(dev, "failed to get page-0 resource\n");
> > +             return -ENOMEM;
> > +     }
> > +
> > +     coresight_pmu->base0 = devm_ioremap_resource(dev, res);
> > +     if (IS_ERR(coresight_pmu->base0)) {
> > +             dev_err(dev, "ioremap failed for page-0 resource\n");
> > +             return PTR_ERR(coresight_pmu->base0);
> > +     }
> 
> devm_platform_ioremap_resource()

Thanks, we will update it.

> 
> > +     /* Base address for page 1 if supported. Otherwise point it to page 0. */
> > +     coresight_pmu->base1 = coresight_pmu->base0;
> > +     if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
> > +             res = platform_get_resource(pdev, IORESOURCE_MEM, 1);
> > +             if (!res) {
> > +                     dev_err(dev, "failed to get page-1 resource\n");
> > +                     return -ENOMEM;
> > +             }
> > +
> > +             coresight_pmu->base1 = devm_ioremap_resource(dev, res);
> > +             if (IS_ERR(coresight_pmu->base1)) {
> > +                     dev_err(dev, "ioremap failed for page-1 resource\n");
> > +                     return PTR_ERR(coresight_pmu->base1);
> > +             }
> 
> Ditto.
> 
> > +     }
> > +
> > +     if (CHECK_APMT_FLAG(apmt_node->flags, ATOMIC, SUPP)) {
> > +             coresight_pmu->read_reg64 = &read_reg64;
> > +             coresight_pmu->write_reg64 = &write_reg64;
> > +     } else {
> > +             coresight_pmu->read_reg64 = &read_reg64_hilohi;
> > +             coresight_pmu->write_reg64 = &write_reg64_lohi;
> > +     }
> > +
> > +     coresight_pmu->pmcfgr = read_reg32(coresight_pmu->base0,
> PMCFGR);
> > +
> > +     coresight_pmu->num_adj_counters = pmcfgr_n(coresight_pmu) + 1;
> > +
> > +     if (support_cc(coresight_pmu)) {
> > +             /**
> > +              * Exclude the cycle counter if there is a gap between
> > +              * cycle counter id and the last regular event counter id.
> > +              */
> > +             if (coresight_pmu->num_adj_counters <=
> CORESIGHT_PMU_IDX_CCNTR)
> > +                     coresight_pmu->num_adj_counters -= 1;
> 
> As before, I think it would be a fair bit clearer to maintain a
> distinction between the number of PMEV{TYPE,CNT,FILT}R registers present
> and the number of logical counters actually usable.
> 

We will revisit this.

> > +     }
> > +
> > +     coresight_pmu->num_set_clr_reg =
> > +             round_up(coresight_pmu->num_adj_counters,
> > +                      CORESIGHT_SET_CLR_REG_COUNTER_NUM) /
> > +             CORESIGHT_SET_CLR_REG_COUNTER_NUM;
> 
> DIV_ROUND_UP()

Thanks, we will update it.

> 
> > +
> > +     return 0;
> > +}
> > +
> > +static inline int
> > +coresight_pmu_get_reset_overflow(struct coresight_pmu
> *coresight_pmu,
> > +                              u32 *pmovs)
> > +{
> > +     int i;
> > +     u32 pmovclr_offset = PMOVSCLR;
> > +     u32 has_overflowed = 0;
> > +
> > +     for (i = 0; i < coresight_pmu->num_set_clr_reg; ++i) {
> > +             pmovs[i] = read_reg32(coresight_pmu->base1, pmovclr_offset);
> > +             has_overflowed |= pmovs[i];
> > +             write_reg32(pmovs[i], coresight_pmu->base1, pmovclr_offset);
> > +             pmovclr_offset += sizeof(u32);
> > +     }
> > +
> > +     return has_overflowed != 0;
> > +}
> > +
> > +static irqreturn_t coresight_pmu_handle_irq(int irq_num, void *dev)
> > +{
> > +     int idx, has_overflowed;
> > +     struct coresight_pmu *coresight_pmu = dev;
> > +     u32 pmovs[CORESIGHT_SET_CLR_REG_MAX_NUM] = { 0 };
> > +     bool handled = false;
> > +
> > +     coresight_pmu_stop_counters(coresight_pmu);
> > +
> > +     has_overflowed =
> coresight_pmu_get_reset_overflow(coresight_pmu, pmovs);
> > +     if (!has_overflowed)
> > +             goto done;
> > +
> > +     for_each_set_bit(idx, (unsigned long *)pmovs,
> > +                      CORESIGHT_PMU_MAX_HW_CNTRS) {
> 
> Why waste time iterating over a probably significant number of
> irrelevant bits?

Yup, this is unnecessary. We will update it along with change on
tracking the used counters.

> 
> > +             struct perf_event *event = coresight_pmu-
> >hw_events.events[idx];
> > +
> > +             if (!event)
> > +                     continue;
> > +
> > +             coresight_pmu_event_update(event);
> > +             coresight_pmu_set_event_period(event);
> > +
> > +             handled = true;
> > +     }
> > +
> > +done:
> > +     coresight_pmu_start_counters(coresight_pmu);
> > +     return IRQ_RETVAL(handled);
> > +}
> > +
> > +static int coresight_pmu_request_irq(struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     int irq, ret;
> > +     struct device *dev;
> > +     struct platform_device *pdev;
> > +     struct acpi_apmt_node *apmt_node;
> > +
> > +     dev = coresight_pmu->dev;
> > +     pdev = to_platform_device(dev);
> > +     apmt_node = coresight_pmu->apmt_node;
> > +
> > +     /* Skip IRQ request if the PMU does not support overflow interrupt. */
> > +     if (apmt_node->ovflw_irq == 0)
> > +             return 0;
> > +
> > +     irq = platform_get_irq(pdev, 0);
> > +     if (irq < 0)
> > +             return irq;
> > +
> > +     ret = devm_request_irq(dev, irq, coresight_pmu_handle_irq,
> > +                            IRQF_NOBALANCING | IRQF_NO_THREAD,
> dev_name(dev),
> > +                            coresight_pmu);
> > +     if (ret) {
> > +             dev_err(dev, "Could not request IRQ %d\n", irq);
> > +             return ret;
> > +     }
> > +
> > +     coresight_pmu->irq = irq;
> > +
> > +     return 0;
> > +}
> > +
> > +static inline int coresight_pmu_find_cpu_container(int cpu, u32
> container_uid)
> > +{
> > +     u32 acpi_uid;
> > +     struct device *cpu_dev = get_cpu_device(cpu);
> > +     struct acpi_device *acpi_dev = ACPI_COMPANION(cpu_dev);
> > +     int level = 0;
> > +
> > +     if (!cpu_dev)
> > +             return -ENODEV;
> > +
> > +     while (acpi_dev) {
> > +             if (!strcmp(acpi_device_hid(acpi_dev),
> > +                         ACPI_PROCESSOR_CONTAINER_HID) &&
> > +                 !kstrtouint(acpi_device_uid(acpi_dev), 0, &acpi_uid) &&
> > +                 acpi_uid == container_uid)
> > +                     return 0;
> > +
> > +             acpi_dev = acpi_dev->parent;
> > +             level++;
> > +     }
> > +
> > +     return -ENODEV;
> > +}
> > +
> > +static int coresight_pmu_get_cpus(struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     struct acpi_apmt_node *apmt_node;
> > +     int affinity_flag;
> > +     int cpu;
> > +
> > +     apmt_node = coresight_pmu->apmt_node;
> > +     affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
> > +
> > +     if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
> > +             for_each_possible_cpu(cpu) {
> > +                     if (apmt_node->proc_affinity ==
> > +                         get_acpi_id_for_cpu(cpu)) {
> > +                             cpumask_set_cpu(
> > +                                     cpu, &coresight_pmu->associated_cpus);
> > +                             break;
> > +                     }
> > +             }
> > +     } else {
> > +             for_each_possible_cpu(cpu) {
> > +                     if (coresight_pmu_find_cpu_container(
> > +                                 cpu, apmt_node->proc_affinity))
> > +                             continue;
> > +
> > +                     cpumask_set_cpu(cpu, &coresight_pmu->associated_cpus);
> > +             }
> > +     }
> > +
> > +     return 0;
> > +}
> > +
> > +static int coresight_pmu_register_pmu(struct coresight_pmu
> *coresight_pmu)
> > +{
> > +     int ret;
> > +     struct attribute_group **attr_groups;
> > +
> > +     attr_groups = coresight_pmu_alloc_attr_group(coresight_pmu);
> > +     if (!attr_groups) {
> > +             ret = -ENOMEM;
> > +             return ret;
> > +     }
> > +
> > +     ret = cpuhp_state_add_instance(coresight_pmu_cpuhp_state,
> > +                                    &coresight_pmu->cpuhp_node);
> > +     if (ret)
> > +             return ret;
> > +
> > +     coresight_pmu->pmu = (struct pmu){
> > +             .task_ctx_nr    = perf_invalid_context,
> > +             .module         = THIS_MODULE,
> > +             .pmu_enable     = coresight_pmu_enable,
> > +             .pmu_disable    = coresight_pmu_disable,
> > +             .event_init     = coresight_pmu_event_init,
> > +             .add            = coresight_pmu_add,
> > +             .del            = coresight_pmu_del,
> > +             .start          = coresight_pmu_start,
> > +             .stop           = coresight_pmu_stop,
> > +             .read           = coresight_pmu_read,
> > +             .attr_groups    = (const struct attribute_group **)attr_groups,
> > +             .capabilities   = PERF_PMU_CAP_NO_EXCLUDE,
> > +     };
> > +
> > +     ret = perf_pmu_register(&coresight_pmu->pmu, coresight_pmu-
> >name, -1);
> > +     if (ret) {
> > +             cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
> > +                                         &coresight_pmu->cpuhp_node);
> > +     }
> > +
> > +     return ret;
> > +}
> > +
> > +static int coresight_pmu_device_probe(struct platform_device *pdev)
> > +{
> > +     int ret;
> > +     struct coresight_pmu *coresight_pmu;
> > +
> > +     ret = coresight_pmu_alloc(pdev, &coresight_pmu);
> > +     if (ret)
> > +             return ret;
> > +
> > +     ret = coresight_pmu_init_mmio(coresight_pmu);
> > +     if (ret)
> > +             return ret;
> > +
> > +     ret = coresight_pmu_request_irq(coresight_pmu);
> > +     if (ret)
> > +             return ret;
> > +
> > +     ret = coresight_pmu_get_cpus(coresight_pmu);
> > +     if (ret)
> > +             return ret;
> > +
> > +     ret = coresight_pmu_register_pmu(coresight_pmu);
> > +     if (ret)
> > +             return ret;
> > +
> > +     return 0;
> > +}
> > +
> > +static int coresight_pmu_device_remove(struct platform_device *pdev)
> > +{
> > +     struct coresight_pmu *coresight_pmu = platform_get_drvdata(pdev);
> > +
> > +     perf_pmu_unregister(&coresight_pmu->pmu);
> > +     cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
> > +                                 &coresight_pmu->cpuhp_node);
> > +
> > +     return 0;
> > +}
> > +
> > +static struct platform_driver coresight_pmu_driver = {
> > +     .driver = {
> > +                     .name = "arm-coresight-pmu",
> > +                     .suppress_bind_attrs = true,
> > +             },
> > +     .probe = coresight_pmu_device_probe,
> > +     .remove = coresight_pmu_device_remove,
> > +};
> > +
> > +static void coresight_pmu_set_active_cpu(int cpu,
> > +                                      struct coresight_pmu *coresight_pmu)
> > +{
> > +     cpumask_set_cpu(cpu, &coresight_pmu->active_cpu);
> > +     WARN_ON(irq_set_affinity(coresight_pmu->irq,
> > +                              &coresight_pmu->active_cpu));
> > +}
> > +
> > +static int coresight_pmu_cpu_online(unsigned int cpu, struct hlist_node
> *node)
> > +{
> > +     struct coresight_pmu *coresight_pmu =
> > +             hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
> > +
> > +     if (!cpumask_test_cpu(cpu, &coresight_pmu->associated_cpus))
> > +             return 0;
> > +
> > +     /* If the PMU is already managed, there is nothing to do */
> > +     if (!cpumask_empty(&coresight_pmu->active_cpu))
> > +             return 0;
> > +
> > +     /* Use this CPU for event counting */
> > +     coresight_pmu_set_active_cpu(cpu, coresight_pmu);
> > +
> > +     return 0;
> > +}
> > +
> > +static int coresight_pmu_cpu_teardown(unsigned int cpu, struct
> hlist_node *node)
> > +{
> > +     int dst;
> > +     struct cpumask online_supported;
> > +
> > +     struct coresight_pmu *coresight_pmu =
> > +             hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
> > +
> > +     /* Nothing to do if this CPU doesn't own the PMU */
> > +     if (!cpumask_test_and_clear_cpu(cpu, &coresight_pmu->active_cpu))
> > +             return 0;
> > +
> > +     /* Choose a new CPU to migrate ownership of the PMU to */
> > +     cpumask_and(&online_supported, &coresight_pmu->associated_cpus,
> > +                 cpu_online_mask);
> > +     dst = cpumask_any_but(&online_supported, cpu);
> > +     if (dst >= nr_cpu_ids)
> > +             return 0;
> > +
> > +     /* Use this CPU for event counting */
> > +     perf_pmu_migrate_context(&coresight_pmu->pmu, cpu, dst);
> > +     coresight_pmu_set_active_cpu(dst, coresight_pmu);
> > +
> > +     return 0;
> > +}
> > +
> > +static int __init coresight_pmu_init(void)
> > +{
> > +     int ret;
> > +
> > +     ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, PMUNAME,
> > +                                   coresight_pmu_cpu_online,
> > +                                   coresight_pmu_cpu_teardown);
> > +     if (ret < 0)
> > +             return ret;
> > +     coresight_pmu_cpuhp_state = ret;
> > +     return platform_driver_register(&coresight_pmu_driver);
> > +}
> > +
> > +static void __exit coresight_pmu_exit(void)
> > +{
> > +     platform_driver_unregister(&coresight_pmu_driver);
> > +     cpuhp_remove_multi_state(coresight_pmu_cpuhp_state);
> > +}
> > +
> > +module_init(coresight_pmu_init);
> > +module_exit(coresight_pmu_exit);
> > diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.h
> b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
> > new file mode 100644
> > index 000000000000..59fb40eafe45
> > --- /dev/null
> > +++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
> > @@ -0,0 +1,147 @@
> > +/* SPDX-License-Identifier: GPL-2.0
> > + *
> > + * ARM CoreSight PMU driver.
> > + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > + *
> > + */
> > +
> > +#ifndef __ARM_CORESIGHT_PMU_H__
> > +#define __ARM_CORESIGHT_PMU_H__
> > +
> > +#include <linux/acpi.h>
> > +#include <linux/bitfield.h>
> > +#include <linux/cpumask.h>
> > +#include <linux/device.h>
> > +#include <linux/kernel.h>
> > +#include <linux/module.h>
> > +#include <linux/perf_event.h>
> > +#include <linux/platform_device.h>
> > +#include <linux/types.h>
> > +
> > +#define to_coresight_pmu(p) (container_of(p, struct coresight_pmu,
> pmu))
> > +
> > +#define CORESIGHT_EXT_ATTR(_name, _func, _config)                    \
> > +     (&((struct dev_ext_attribute[]){                                \
> > +             {                                                       \
> > +                     .attr = __ATTR(_name, 0444, _func, NULL),       \
> > +                     .var = (void *)_config                          \
> > +             }                                                       \
> > +     })[0].attr.attr)
> > +
> > +#define CORESIGHT_FORMAT_ATTR(_name, _config)                                \
> > +     CORESIGHT_EXT_ATTR(_name, coresight_pmu_sysfs_format_show,
> \
> > +                        (char *)_config)
> > +
> > +#define CORESIGHT_EVENT_ATTR(_name, _config)                         \
> > +     PMU_EVENT_ATTR_ID(_name, coresight_pmu_sysfs_event_show,
> _config)
> > +
> > +/**
> > + * This is the default event number for cycle count, if supported, since the
> > + * ARM Coresight PMU specification does not define a standard event
> code
> > + * for cycle count.
> > + */
> > +#define CORESIGHT_PMU_EVT_CYCLES_DEFAULT (0x1ULL << 31)
> 
> And what do we do when an implementation defines 0x80000000 as one of
> its own event specifiers? The standard cycle count is independent of any
> other events, so it needs to be encoded in a manner which is distinct
> from *any* potentially-valid PMEVTYPER value.

We were thinking that in such case, the implementor would provide coresight_pmu_impl_ops.
To avoid it, I guess we can use config[32] for the default cycle count event id.
The filter value will need to be moved to config1[31:0].
Does it sound reasonable ?

> 
> > +
> > +/**
> > + * The ARM Coresight PMU supports up to 256 event counters.
> > + * If the counters are larger-than 32-bits, then the PMU includes at
> > + * most 128 counters.
> > + */
> > +#define CORESIGHT_PMU_MAX_HW_CNTRS 256
> > +
> > +/* The cycle counter, if implemented, is located at counter[31]. */
> > +#define CORESIGHT_PMU_IDX_CCNTR 31
> > +
> > +struct coresight_pmu;
> > +
> > +/* This tracks the events assigned to each counter in the PMU. */
> > +struct coresight_pmu_hw_events {
> > +     /* The events that are active on the PMU for the given index. */
> > +     struct perf_event *events[CORESIGHT_PMU_MAX_HW_CNTRS];
> 
> This is really quite big - 2KB per PMU on 64-bit - given the likelihood
> that typically only a fraction of that might be needed. As mentioned, it
> should already be tickling CONFIG_FRAME_WARN in
> coresight_pmu_validate_group().

We will rework it.

Thanks for all your comments, Robin.

Regards,
Besar

> 
> Thanks,
> Robin.
> 
> > +     /* Each bit indicates a counter is being used (or not) for an event. */
> > +     DECLARE_BITMAP(used_ctrs, CORESIGHT_PMU_MAX_HW_CNTRS);
> > +};
> > +
> > +/* Contains ops to query vendor/implementer specific attribute. */
> > +struct coresight_pmu_impl_ops {
> > +     /* Get event attributes */
> > +     struct attribute **(*get_event_attrs)(
> > +             const struct coresight_pmu *coresight_pmu);
> > +     /* Get format attributes */
> > +     struct attribute **(*get_format_attrs)(
> > +             const struct coresight_pmu *coresight_pmu);
> > +     /* Get string identifier */
> > +     const char *(*get_identifier)(const struct coresight_pmu
> *coresight_pmu);
> > +     /* Check if the event corresponds to cycle count event */
> > +     bool (*is_cc_event)(const struct perf_event *event);
> > +     /* Decode event type/id from configs */
> > +     u32 (*event_type)(const struct perf_event *event);
> > +     /* Decode filter value from configs */
> > +     u32 (*event_filter)(const struct perf_event *event);
> > +};
> > +
> > +/* Vendor/implementer descriptor. */
> > +struct coresight_pmu_impl {
> > +     u32 pmiidr;
> > +     const struct coresight_pmu_impl_ops *ops;
> > +};
> > +
> > +/* Coresight PMU descriptor. */
> > +struct coresight_pmu {
> > +     struct pmu pmu;
> > +     struct device *dev;
> > +     struct acpi_apmt_node *apmt_node;
> > +     const char *name;
> > +     const char *identifier;
> > +     void __iomem *base0;
> > +     void __iomem *base1;
> > +     int irq;
> > +     cpumask_t associated_cpus;
> > +     cpumask_t active_cpu;
> > +     struct hlist_node cpuhp_node;
> > +
> > +     u32 pmcfgr;
> > +     u32 num_adj_counters;
> > +     u32 num_set_clr_reg;
> > +
> > +     struct coresight_pmu_hw_events hw_events;
> > +
> > +     void (*write_reg64)(u64 val, void __iomem *base, u32 offset);
> > +     u64 (*read_reg64)(void __iomem *base, u32 offset);
> > +
> > +     struct coresight_pmu_impl impl;
> > +};
> > +
> > +/* Default function to show event attribute in sysfs. */
> > +ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
> > +                                    struct device_attribute *attr,
> > +                                    char *buf);
> > +
> > +/* Default function to show format attribute in sysfs. */
> > +ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
> > +                                     struct device_attribute *attr,
> > +                                     char *buf);
> > +
> > +/* Get the default Coresight PMU event attributes. */
> > +struct attribute **
> > +coresight_pmu_get_event_attrs(const struct coresight_pmu
> *coresight_pmu);
> > +
> > +/* Get the default Coresight PMU format attributes. */
> > +struct attribute **
> > +coresight_pmu_get_format_attrs(const struct coresight_pmu
> *coresight_pmu);
> > +
> > +/* Get the default Coresight PMU device identifier. */
> > +const char *
> > +coresight_pmu_get_identifier(const struct coresight_pmu
> *coresight_pmu);
> > +
> > +/* Default function to query if an event is a cycle counter event. */
> > +bool coresight_pmu_is_cc_event(const struct perf_event *event);
> > +
> > +/* Default function to query the type/id of an event. */
> > +u32 coresight_pmu_event_type(const struct perf_event *event);
> > +
> > +/* Default function to query the filter value of an event. */
> > +u32 coresight_pmu_event_filter(const struct perf_event *event);
> > +
> > +#endif /* __ARM_CORESIGHT_PMU_H__ */

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-10 11:13       ` Will Deacon
  2022-05-10 18:40         ` Sudeep Holla
@ 2022-05-11  8:44         ` Suzuki K Poulose
  2022-05-11 16:44           ` Besar Wicaksono
  1 sibling, 1 reply; 31+ messages in thread
From: Suzuki K Poulose @ 2022-05-11  8:44 UTC (permalink / raw)
  To: Will Deacon, Sudeep Holla
  Cc: Besar Wicaksono, catalin.marinas, mark.rutland, linux-arm-kernel,
	linux-kernel, linux-tegra, thanu.rangarajan, Michael.Williams,
	treding, jonathanh, vsethi, Mathieu Poirier

On 10/05/2022 12:13, Will Deacon wrote:
> On Tue, May 10, 2022 at 12:07:42PM +0100, Sudeep Holla wrote:
>> On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
>>> Cc: Mike Williams, Mathieu Poirier
>>> On 09/05/2022 10:28, Will Deacon wrote:
>>>> On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
>>>>>    arch/arm64/configs/defconfig                  |    1 +
>>>>>    drivers/perf/Kconfig                          |    2 +
>>>>>    drivers/perf/Makefile                         |    1 +
>>>>>    drivers/perf/coresight_pmu/Kconfig            |   10 +
>>>>>    drivers/perf/coresight_pmu/Makefile           |    7 +
>>>>>    .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317 +++++++++++++++++
>>>>>    .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
>>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
>>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
>>>>>    9 files changed, 1802 insertions(+)
>>>>
>>>> How does this interact with all the stuff we have under
>>>> drivers/hwtracing/coresight/?
>>>
>>> Absolutely zero, except for the name. The standard
>>> is named "CoreSight PMU" which is a bit unfortunate,
>>> given the only link, AFAIU, with the "CoreSight" architecture
>>> is the Lock Access Register(LAR). For reference, the
>>> drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
>>> tracing and the PMU is called "cs_etm" (expands to coresight etm).
>>> Otherwise the standard doesn't have anything to do with what
>>> exists already in the kernel.
> 
> That's... a poor naming choice! But good, if it's entirely separate then I
> don't have to worry about that. Just wanted to make sure we're not going to
> get tangled up in things like ROM tables and Coresight power domains for
> these things.
> 
>>> One potential recommendation for the name is, "Arm PMU"  (The ACPI table is
>>> named Arm PMU Table). But then that could be clashing with the armv8_pmu
>>> :-(.
>>>
>>> Some of the other options are :
>>>
>>> "Arm Generic PMU"
>>> "Arm Uncore PMU"
>>
>> I wasn't sure on this if there is any restriction on usage of this on Arm
>> and hence didn't make the suggestion. But if allowed, this would be my
>> choice too.
> 
> We'd taken to calling them "System" PMUS in the past, so maybe just stick
> with that? I think "Uncore" is Intel terminology so it's probably best to

I thought about that, but there are some IPs named "System Profilers" 
(e.g., on Juno board) which could be easily confused. But I hope their
population in the name space is much less. So, I am happy with that
choice. The only other concern is, it doesn't indicate it supports PMUs
that are compliant to a given Arm Standard. i.e., people could think of 
this as a "single type" of PMU.
So, I am wondering if something like "Arm Standard PMU" makes any sense ?

Also, I hope the drivers would choose a name indicating the "type"  -
<vendor>_<type>_pmu (e.g., nvidia_pcie_pmu, arm_smmuv3_pmu etc) while 
registering their PMU. That way it is clearer for the PMU while the
base device could be arm_system_pmu_0 etc.

Suzuki


> avoid it for non-Intel parts.
> 
> Will


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-11  2:46     ` Besar Wicaksono
@ 2022-05-11 10:03       ` Robin Murphy
  0 siblings, 0 replies; 31+ messages in thread
From: Robin Murphy @ 2022-05-11 10:03 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: catalin.marinas, will, mark.rutland, linux-arm-kernel,
	linux-kernel, linux-tegra, sudeep.holla, thanu.rangarajan,
	Michael.Williams, suzuki.poulose, Thierry Reding,
	Jonathan Hunter, Vikram Sethi

On 2022-05-11 03:46, Besar Wicaksono wrote:
[...]
>>> +config ARM_CORESIGHT_PMU
>>> +     tristate "ARM Coresight PMU"
>>> +     depends on ARM64 && ACPI_APMT
>>
>> There shouldn't be any functional dependency on any CPU architecture here.
> 
> The spec is targeted towards ARM based system, shouldn't we explicitly limit it to ARM?

I wouldn't say so. The PMU spec does occasionally make reference to the 
Armv8-A and Armv8-M PMU architectures for comparison, but ultimately 
it's specifying an MMIO register interface for a system component. If 
3rd-party system IP vendors adopt it, who knows what kind of systems 
these PMUs might end up in? (And of course a DT binding will inevitably 
come along once the rest of the market catches up with the ACPI-focused 
early adopters)

In terms of functional dependency plus scope of practical usefulness, I 
think something like:

	depends on ACPI
	depends on ACPI_APMT || COMPILE_TEST

would probably fit the bill until DT support comes along.

[...]
>>> +/*
>>> + * Write to 64-bit register as a pair of 32-bit registers.
>>> + *
>>> + * @val     : 64-bit value to write.
>>> + * @base    : base address of page-0 or page-1 if dual-page ext. is enabled.
>>> + * @offset  : register offset.
>>> + *
>>> + */
>>> +static void write_reg64_lohi(u64 val, void __iomem *base, u32 offset)
>>> +{
>>> +     u32 val_lo, val_hi;
>>> +
>>> +     val_hi = upper_32_bits(val);
>>> +     val_lo = lower_32_bits(val);
>>> +
>>> +     write_reg32(val_lo, base, offset);
>>> +     write_reg32(val_hi, base, offset + 4);
>>> +}
>>
>> #include <linux/io-64-nonatomic-lo-hi.h>
> 
> Thanks for pointing this out. We will replace it with lo_hi_writeq.

The point is more that you can just use writeq() (and readq() where 
atomicity isn't important), and the header will make sure it works wherever.

The significance of not having 64-bit single-copy atomicity should be 
that if the processor issues a 64-bit access, the system may 
*automatically* split it into a pair of 32-bit accesses, e.g. at an 
AXI-to-APB bridge. If making a 64-bit access to a 64-bit register would 
actually fail, that's just broken.

[...]
>>> +static inline bool is_cycle_cntr_idx(const struct perf_event *event)
>>> +{
>>> +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
>>> pmu);
>>> +     int idx = event->hw.idx;
>>> +
>>> +     return (support_cc(coresight_pmu) && idx ==
>> CORESIGHT_PMU_IDX_CCNTR);
>>
>> If we don't support cycle counting, cycles count events should have been
>> rejected in event_init. If they're able to propagate further than that

[apologies for an editing mishap here, this should have continued "then 
something is fundamentally broken."]

> Not sure I understand, do you mean the check for cycle counter support is unnecessary ?
> This function is actually called by coresight_pmu_start, which is after event_init had passed.
> coresight_pmu_start is not aware if cycle counter is supported or not, so we need to keep checking it.

I mean that the support_cc(coresight_pmu) check should only ever need to 
happen *once* in event_init, so if standard cycles events are not 
supported then they are correctly rejected there and then. After that, 
if we see one in event_add and later, then we can simply infer that we 
*do* have a standard cycle counter and go ahead and allocate it.

>>> +}
>>> +
>>> +bool coresight_pmu_is_cc_event(const struct perf_event *event)
>>> +{
>>> +     struct coresight_pmu *coresight_pmu = to_coresight_pmu(event-
>>> pmu);
>>> +     u32 evtype = coresight_pmu->impl.ops->event_type(event);
>>> +
>>> +     return (support_cc(coresight_pmu) &&
>>
>> Ditto.
> 
> This function is called by event_init to validate the event and find available counters.

Right, but it also ends up getting called from other places like 
event_add as well. Like I say, if we're still checking whether an event 
is supported or not by that point, we're doing something wrong.

[...]>>> +/**
>>> + * This is the default event number for cycle count, if supported, since the
>>> + * ARM Coresight PMU specification does not define a standard event
>> code
>>> + * for cycle count.
>>> + */
>>> +#define CORESIGHT_PMU_EVT_CYCLES_DEFAULT (0x1ULL << 31)
>>
>> And what do we do when an implementation defines 0x80000000 as one of
>> its own event specifiers? The standard cycle count is independent of any
>> other events, so it needs to be encoded in a manner which is distinct
>> from *any* potentially-valid PMEVTYPER value.
> 
> We were thinking that in such case, the implementor would provide coresight_pmu_impl_ops.
> To avoid it, I guess we can use config[32] for the default cycle count event id.
> The filter value will need to be moved to config1[31:0].
> Does it sound reasonable ?

Sure, you can lay out the config fields however you fancy, but since the 
architecture leaves the standard cycles event independent from the 
32-bit IMP-DEF PMEVTYPER specifier, logically we need at least 33 bits 
in some form or other to encode all possible event types in our 
perf_event config.

Thanks,
Robin.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-11  1:29           ` Besar Wicaksono
@ 2022-05-11 12:42             ` Robin Murphy
  2022-05-13  6:16               ` Thanu Rangarajan
  0 siblings, 1 reply; 31+ messages in thread
From: Robin Murphy @ 2022-05-11 12:42 UTC (permalink / raw)
  To: Besar Wicaksono, Sudeep Holla
  Cc: Suzuki K Poulose, Will Deacon, catalin.marinas, mark.rutland,
	linux-arm-kernel, linux-kernel, linux-tegra, thanu.rangarajan,
	Michael.Williams, Thierry Reding, Jonathan Hunter, Vikram Sethi,
	Mathieu Poirier

On 2022-05-11 02:29, Besar Wicaksono wrote:
> 
> 
>> -----Original Message-----
>> From: Sudeep Holla <sudeep.holla@arm.com>
>> Sent: Tuesday, May 10, 2022 1:40 PM
>> To: Besar Wicaksono <bwicaksono@nvidia.com>
>> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>; Will Deacon
>> <will@kernel.org>; Sudeep Holla <sudeep.holla@arm.com>;
>> catalin.marinas@arm.com; mark.rutland@arm.com; linux-arm-
>> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
>> tegra@vger.kernel.org; thanu.rangarajan@arm.com;
>> Michael.Williams@arm.com; Thierry Reding <treding@nvidia.com>; Jonathan
>> Hunter <jonathanh@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>;
>> Mathieu Poirier <mathieu.poirier@linaro.org>
>> Subject: Re: [PATCH 0/2] perf: ARM CoreSight PMU support
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On Tue, May 10, 2022 at 12:13:19PM +0100, Will Deacon wrote:
>>> On Tue, May 10, 2022 at 12:07:42PM +0100, Sudeep Holla wrote:
>>>> On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
>>>>> Cc: Mike Williams, Mathieu Poirier
>>>>> On 09/05/2022 10:28, Will Deacon wrote:
>>>>>> On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
>>>>>>>    arch/arm64/configs/defconfig                  |    1 +
>>>>>>>    drivers/perf/Kconfig                          |    2 +
>>>>>>>    drivers/perf/Makefile                         |    1 +
>>>>>>>    drivers/perf/coresight_pmu/Kconfig            |   10 +
>>>>>>>    drivers/perf/coresight_pmu/Makefile           |    7 +
>>>>>>>    .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317
>> +++++++++++++++++
>>>>>>>    .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
>>>>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
>>>>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
>>>>>>>    9 files changed, 1802 insertions(+)
>>>>>>
>>>>>> How does this interact with all the stuff we have under
>>>>>> drivers/hwtracing/coresight/?
>>>>>
>>>>> Absolutely zero, except for the name. The standard
>>>>> is named "CoreSight PMU" which is a bit unfortunate,
>>>>> given the only link, AFAIU, with the "CoreSight" architecture
>>>>> is the Lock Access Register(LAR). For reference, the
>>>>> drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
>>>>> tracing and the PMU is called "cs_etm" (expands to coresight etm).
>>>>> Otherwise the standard doesn't have anything to do with what
>>>>> exists already in the kernel.
>>>
>>> That's... a poor naming choice! But good, if it's entirely separate then I
>>> don't have to worry about that. Just wanted to make sure we're not going
>> to
>>> get tangled up in things like ROM tables and Coresight power domains for
>>> these things.
>>>
>>
>> OK, now that triggered another question/thought.
>>
>> 1. Do you need to do active power management for these PMUs ? Or like
>>     CPU PMUs, do you reject entering low power states if there is active
>>     session in progress. If there is active session, runtime PM won't get
>>     triggered but if there is system wide suspend, how is that dealt with ?
>>
> 
> Looking at the other uncore/system PMUs, none of the drivers support PM ops.
> NVIDIA system PMU also does not get power gated and system suspend is not
> supported. But just like other uncore PMU driver, this driver supports CPU hotplug.
> If PM is needed, the required info should have been expressed in ACPI.
> 
>> 2. Assuming you need some sort of PM, and since this is static table(which
>>     I really don't like/prefer but it is out there 🙁), how do you plan to
>>     get the power domain related information.
>>
> 
> I guess the APMT spec in section 2.2 may cover this. If a PMU implementation has
> properties beyond what is defined in the spec, these properties can be described in DSDT.
> The driver doesn’t take care of this currently, so this is a room for future improvement.

Yes, I assume it's essentially the same story as for MPAM MSCs in this 
respect. Plus it means that MSI support will be similarly fun, where 
we'll need to have a corresponding DSDT device via which we can request 
the interrupt, because that needs to further correlate to an IORT Named 
Component node describing the ITS mapping. Hopefully we can abstract 
some of that in the APMT code rather than expose it all to the PMU 
driver...

Robin.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-11  8:44         ` Suzuki K Poulose
@ 2022-05-11 16:44           ` Besar Wicaksono
  2022-05-13 12:25             ` Besar Wicaksono
  0 siblings, 1 reply; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-11 16:44 UTC (permalink / raw)
  To: Suzuki K Poulose, Will Deacon, Sudeep Holla
  Cc: catalin.marinas, mark.rutland, linux-arm-kernel, linux-kernel,
	linux-tegra, thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi, Mathieu Poirier



> -----Original Message-----
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> Sent: Wednesday, May 11, 2022 3:45 AM
> To: Will Deacon <will@kernel.org>; Sudeep Holla <sudeep.holla@arm.com>
> Cc: Besar Wicaksono <bwicaksono@nvidia.com>; catalin.marinas@arm.com;
> mark.rutland@arm.com; linux-arm-kernel@lists.infradead.org; linux-
> kernel@vger.kernel.org; linux-tegra@vger.kernel.org;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>; Mathieu Poirier <mathieu.poirier@linaro.org>
> Subject: Re: [PATCH 0/2] perf: ARM CoreSight PMU support
> 
> External email: Use caution opening links or attachments
> 
> 
> On 10/05/2022 12:13, Will Deacon wrote:
> > On Tue, May 10, 2022 at 12:07:42PM +0100, Sudeep Holla wrote:
> >> On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
> >>> Cc: Mike Williams, Mathieu Poirier
> >>> On 09/05/2022 10:28, Will Deacon wrote:
> >>>> On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
> >>>>>    arch/arm64/configs/defconfig                  |    1 +
> >>>>>    drivers/perf/Kconfig                          |    2 +
> >>>>>    drivers/perf/Makefile                         |    1 +
> >>>>>    drivers/perf/coresight_pmu/Kconfig            |   10 +
> >>>>>    drivers/perf/coresight_pmu/Makefile           |    7 +
> >>>>>    .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317
> +++++++++++++++++
> >>>>>    .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
> >>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
> >>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
> >>>>>    9 files changed, 1802 insertions(+)
> >>>>
> >>>> How does this interact with all the stuff we have under
> >>>> drivers/hwtracing/coresight/?
> >>>
> >>> Absolutely zero, except for the name. The standard
> >>> is named "CoreSight PMU" which is a bit unfortunate,
> >>> given the only link, AFAIU, with the "CoreSight" architecture
> >>> is the Lock Access Register(LAR). For reference, the
> >>> drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
> >>> tracing and the PMU is called "cs_etm" (expands to coresight etm).
> >>> Otherwise the standard doesn't have anything to do with what
> >>> exists already in the kernel.
> >
> > That's... a poor naming choice! But good, if it's entirely separate then I
> > don't have to worry about that. Just wanted to make sure we're not going
> to
> > get tangled up in things like ROM tables and Coresight power domains for
> > these things.
> >
> >>> One potential recommendation for the name is, "Arm PMU"  (The ACPI
> table is
> >>> named Arm PMU Table). But then that could be clashing with the
> armv8_pmu
> >>> :-(.
> >>>
> >>> Some of the other options are :
> >>>
> >>> "Arm Generic PMU"
> >>> "Arm Uncore PMU"
> >>
> >> I wasn't sure on this if there is any restriction on usage of this on Arm
> >> and hence didn't make the suggestion. But if allowed, this would be my
> >> choice too.
> >
> > We'd taken to calling them "System" PMUS in the past, so maybe just stick
> > with that? I think "Uncore" is Intel terminology so it's probably best to
> 
> I thought about that, but there are some IPs named "System Profilers"
> (e.g., on Juno board) which could be easily confused. But I hope their
> population in the name space is much less. So, I am happy with that
> choice. The only other concern is, it doesn't indicate it supports PMUs
> that are compliant to a given Arm Standard. i.e., people could think of
> this as a "single type" of PMU.
> So, I am wondering if something like "Arm Standard PMU" makes any sense ?
> 
> Also, I hope the drivers would choose a name indicating the "type"  -
> <vendor>_<type>_pmu (e.g., nvidia_pcie_pmu, arm_smmuv3_pmu etc)
> while
> registering their PMU. That way it is clearer for the PMU while the
> base device could be arm_system_pmu_0 etc.

From the other PMU drivers, the registered name may have additional properties
specific to the implementation, e.g. socket, cluster id, instance number, memory
address, cache level. Since this is a shared driver, my initial thought is to register
a default arm_coresight_pmu<APMT node id> naming format for consistency and
"identifier" sysfs node to distinguish the PMUs. If an implementation needs to
expose more details about the PMU, it can be communicated via additional
sysfs attributes.

Regards,
Besar

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-11 12:42             ` Robin Murphy
@ 2022-05-13  6:16               ` Thanu Rangarajan
  0 siblings, 0 replies; 31+ messages in thread
From: Thanu Rangarajan @ 2022-05-13  6:16 UTC (permalink / raw)
  To: Robin Murphy, Besar Wicaksono, Sudeep Holla
  Cc: Suzuki Poulose, Will Deacon, Catalin Marinas, Mark Rutland,
	linux-arm-kernel, linux-kernel, linux-tegra,
	Michael Williams (ATG),
	Thierry Reding, Jonathan Hunter, Vikram Sethi, Mathieu Poirier



On 11/05/2022, 18:12, "Robin Murphy" <robin.murphy@arm.com> wrote:

    On 2022-05-11 02:29, Besar Wicaksono wrote:
    > 
    > 
    >> -----Original Message-----
    >> From: Sudeep Holla <sudeep.holla@arm.com>
    >> Sent: Tuesday, May 10, 2022 1:40 PM
    >> To: Besar Wicaksono <bwicaksono@nvidia.com>
    >> Cc: Suzuki K Poulose <suzuki.poulose@arm.com>; Will Deacon
    >> <will@kernel.org>; Sudeep Holla <sudeep.holla@arm.com>;
    >> catalin.marinas@arm.com; mark.rutland@arm.com; linux-arm-
    >> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
    >> tegra@vger.kernel.org; thanu.rangarajan@arm.com;
    >> Michael.Williams@arm.com; Thierry Reding <treding@nvidia.com>; Jonathan
    >> Hunter <jonathanh@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>;
    >> Mathieu Poirier <mathieu.poirier@linaro.org>
    >> Subject: Re: [PATCH 0/2] perf: ARM CoreSight PMU support
    >>
    >> External email: Use caution opening links or attachments
    >>
    >>
    >> On Tue, May 10, 2022 at 12:13:19PM +0100, Will Deacon wrote:
    >>> On Tue, May 10, 2022 at 12:07:42PM +0100, Sudeep Holla wrote:
    >>>> On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
    >>>>> Cc: Mike Williams, Mathieu Poirier
    >>>>> On 09/05/2022 10:28, Will Deacon wrote:
    >>>>>> On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
    >>>>>>>    arch/arm64/configs/defconfig                  |    1 +
    >>>>>>>    drivers/perf/Kconfig                          |    2 +
    >>>>>>>    drivers/perf/Makefile                         |    1 +
    >>>>>>>    drivers/perf/coresight_pmu/Kconfig            |   10 +
    >>>>>>>    drivers/perf/coresight_pmu/Makefile           |    7 +
    >>>>>>>    .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317
    >> +++++++++++++++++
    >>>>>>>    .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
    >>>>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
    >>>>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
    >>>>>>>    9 files changed, 1802 insertions(+)
    >>>>>>
    >>>>>> How does this interact with all the stuff we have under
    >>>>>> drivers/hwtracing/coresight/?
    >>>>>
    >>>>> Absolutely zero, except for the name. The standard
    >>>>> is named "CoreSight PMU" which is a bit unfortunate,
    >>>>> given the only link, AFAIU, with the "CoreSight" architecture
    >>>>> is the Lock Access Register(LAR). For reference, the
    >>>>> drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
    >>>>> tracing and the PMU is called "cs_etm" (expands to coresight etm).
    >>>>> Otherwise the standard doesn't have anything to do with what
    >>>>> exists already in the kernel.
    >>>
    >>> That's... a poor naming choice! But good, if it's entirely separate then I
    >>> don't have to worry about that. Just wanted to make sure we're not going
    >> to
    >>> get tangled up in things like ROM tables and Coresight power domains for
    >>> these things.
    >>>
    >>
    >> OK, now that triggered another question/thought.
    >>
    >> 1. Do you need to do active power management for these PMUs ? Or like
    >>     CPU PMUs, do you reject entering low power states if there is active
    >>     session in progress. If there is active session, runtime PM won't get
    >>     triggered but if there is system wide suspend, how is that dealt with ?
    >>
    > 
    > Looking at the other uncore/system PMUs, none of the drivers support PM ops.
    > NVIDIA system PMU also does not get power gated and system suspend is not
    > supported. But just like other uncore PMU driver, this driver supports CPU hotplug.
    > If PM is needed, the required info should have been expressed in ACPI.
    > 
    >> 2. Assuming you need some sort of PM, and since this is static table(which
    >>     I really don't like/prefer but it is out there 🙁), how do you plan to
    >>     get the power domain related information.
    >>
    > 
    > I guess the APMT spec in section 2.2 may cover this. If a PMU implementation has
    > properties beyond what is defined in the spec, these properties can be described in DSDT.
    > The driver doesn’t take care of this currently, so this is a room for future improvement.

    Yes, I assume it's essentially the same story as for MPAM MSCs in this 
    respect. Plus it means that MSI support will be similarly fun, where 
    we'll need to have a corresponding DSDT device via which we can request 
    the interrupt, because that needs to further correlate to an IORT Named 
    Component node describing the ITS mapping. Hopefully we can abstract 
    some of that in the APMT code rather than expose it all to the PMU 
    driver...

[tr] Indeed. The PM properties are optional and only required if the parent IP block cannot autonomously manage the PMU context on power state transitions. As such, Power management is a dynamic property. Static properties are best described in a static table, and dynamic properties in DSDT. Moreover, the static table is useful for miscellaneous properties that cannot be readily described in DSDT, unless we resort to kludges like _DSD. The static table is a simple data structure in memory, we don’t need an interpreter to access its contents.

We do similarly for processors. The MADT describes static properties. Power management is described in DSDT (_LPI).

Coming to MSIs, the named component to describe the MSI is in the IORT, a _static table_ which in turn points to a device in DSDT.

Regards,
Thanu

    Robin.


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH 0/2] perf: ARM CoreSight PMU support
  2022-05-11 16:44           ` Besar Wicaksono
@ 2022-05-13 12:25             ` Besar Wicaksono
  0 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-13 12:25 UTC (permalink / raw)
  To: Suzuki K Poulose, Will Deacon, Sudeep Holla
  Cc: catalin.marinas, mark.rutland, linux-arm-kernel, linux-kernel,
	linux-tegra, thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi, Mathieu Poirier



> -----Original Message-----
> From: Besar Wicaksono
> Sent: Wednesday, May 11, 2022 11:45 AM
> To: Suzuki K Poulose <suzuki.poulose@arm.com>; Will Deacon
> <will@kernel.org>; Sudeep Holla <sudeep.holla@arm.com>
> Cc: catalin.marinas@arm.com; mark.rutland@arm.com; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; thanu.rangarajan@arm.com;
> Michael.Williams@arm.com; Thierry Reding <treding@nvidia.com>; Jonathan
> Hunter <jonathanh@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>;
> Mathieu Poirier <mathieu.poirier@linaro.org>
> Subject: RE: [PATCH 0/2] perf: ARM CoreSight PMU support
> 
> 
> 
> > -----Original Message-----
> > From: Suzuki K Poulose <suzuki.poulose@arm.com>
> > Sent: Wednesday, May 11, 2022 3:45 AM
> > To: Will Deacon <will@kernel.org>; Sudeep Holla
> <sudeep.holla@arm.com>
> > Cc: Besar Wicaksono <bwicaksono@nvidia.com>;
> catalin.marinas@arm.com;
> > mark.rutland@arm.com; linux-arm-kernel@lists.infradead.org; linux-
> > kernel@vger.kernel.org; linux-tegra@vger.kernel.org;
> > thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> > <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> > Sethi <vsethi@nvidia.com>; Mathieu Poirier <mathieu.poirier@linaro.org>
> > Subject: Re: [PATCH 0/2] perf: ARM CoreSight PMU support
> >
> > External email: Use caution opening links or attachments
> >
> >
> > On 10/05/2022 12:13, Will Deacon wrote:
> > > On Tue, May 10, 2022 at 12:07:42PM +0100, Sudeep Holla wrote:
> > >> On Mon, May 09, 2022 at 11:02:23AM +0100, Suzuki K Poulose wrote:
> > >>> Cc: Mike Williams, Mathieu Poirier
> > >>> On 09/05/2022 10:28, Will Deacon wrote:
> > >>>> On Sun, May 08, 2022 at 07:28:08PM -0500, Besar Wicaksono wrote:
> > >>>>>    arch/arm64/configs/defconfig                  |    1 +
> > >>>>>    drivers/perf/Kconfig                          |    2 +
> > >>>>>    drivers/perf/Makefile                         |    1 +
> > >>>>>    drivers/perf/coresight_pmu/Kconfig            |   10 +
> > >>>>>    drivers/perf/coresight_pmu/Makefile           |    7 +
> > >>>>>    .../perf/coresight_pmu/arm_coresight_pmu.c    | 1317
> > +++++++++++++++++
> > >>>>>    .../perf/coresight_pmu/arm_coresight_pmu.h    |  147 ++
> > >>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  300 ++++
> > >>>>>    .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
> > >>>>>    9 files changed, 1802 insertions(+)
> > >>>>
> > >>>> How does this interact with all the stuff we have under
> > >>>> drivers/hwtracing/coresight/?
> > >>>
> > >>> Absolutely zero, except for the name. The standard
> > >>> is named "CoreSight PMU" which is a bit unfortunate,
> > >>> given the only link, AFAIU, with the "CoreSight" architecture
> > >>> is the Lock Access Register(LAR). For reference, the
> > >>> drivers/hwtracing/coresight/ is purely "CoreSight" self-hosted
> > >>> tracing and the PMU is called "cs_etm" (expands to coresight etm).
> > >>> Otherwise the standard doesn't have anything to do with what
> > >>> exists already in the kernel.
> > >
> > > That's... a poor naming choice! But good, if it's entirely separate then I
> > > don't have to worry about that. Just wanted to make sure we're not
> going
> > to
> > > get tangled up in things like ROM tables and Coresight power domains for
> > > these things.
> > >
> > >>> One potential recommendation for the name is, "Arm PMU"  (The
> ACPI
> > table is
> > >>> named Arm PMU Table). But then that could be clashing with the
> > armv8_pmu
> > >>> :-(.
> > >>>
> > >>> Some of the other options are :
> > >>>
> > >>> "Arm Generic PMU"
> > >>> "Arm Uncore PMU"
> > >>
> > >> I wasn't sure on this if there is any restriction on usage of this on Arm
> > >> and hence didn't make the suggestion. But if allowed, this would be my
> > >> choice too.
> > >
> > > We'd taken to calling them "System" PMUS in the past, so maybe just
> stick
> > > with that? I think "Uncore" is Intel terminology so it's probably best to
> >
> > I thought about that, but there are some IPs named "System Profilers"
> > (e.g., on Juno board) which could be easily confused. But I hope their
> > population in the name space is much less. So, I am happy with that
> > choice. The only other concern is, it doesn't indicate it supports PMUs
> > that are compliant to a given Arm Standard. i.e., people could think of
> > this as a "single type" of PMU.
> > So, I am wondering if something like "Arm Standard PMU" makes any
> sense ?
> >
> > Also, I hope the drivers would choose a name indicating the "type"  -
> > <vendor>_<type>_pmu (e.g., nvidia_pcie_pmu, arm_smmuv3_pmu etc)
> > while
> > registering their PMU. That way it is clearer for the PMU while the
> > base device could be arm_system_pmu_0 etc.
> 
> From the other PMU drivers, the registered name may have additional
> properties
> specific to the implementation, e.g. socket, cluster id, instance number,
> memory
> address, cache level. Since this is a shared driver, my initial thought is to
> register
> a default arm_coresight_pmu<APMT node id> naming format for
> consistency and
> "identifier" sysfs node to distinguish the PMUs. If an implementation needs
> to
> expose more details about the PMU, it can be communicated via additional
> sysfs attributes.

Hi Will and Suzuki,

Shall we go ahead with "arm_system_pmu" for the device name ?

> 
> Regards,
> Besar

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 0/2] perf: ARM CoreSight PMU support
  2022-05-09  0:28 [PATCH 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
                   ` (2 preceding siblings ...)
  2022-05-09  9:28 ` [PATCH 0/2] perf: ARM CoreSight PMU support Will Deacon
@ 2022-05-15 16:30 ` Besar Wicaksono
  2022-05-15 16:30   ` [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
                     ` (2 more replies)
  3 siblings, 3 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-15 16:30 UTC (permalink / raw)
  To: robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, suzuki.poulose, treding,
	jonathanh, vsethi, Besar Wicaksono

Add driver support for ARM CoreSight PMU device and event attributes for NVIDIA
implementation. The code is based on ARM Coresight PMU architecture and ACPI ARM
Performance Monitoring Unit table (APMT) specification below:
 * ARM Coresight PMU:
        https://developer.arm.com/documentation/ihi0091/latest
 * APMT: https://developer.arm.com/documentation/den0117/latest

Notes:
 * There is a concern on the naming of the PMU device.
   Currently the driver is probing "arm-coresight-pmu" device, however the APMT
   spec supports different kinds of CoreSight PMU based implementation. So it is
   open for discussion if the name can stay or a "generic" name is required.
   Please see the following thread:
   http://lists.infradead.org/pipermail/linux-arm-kernel/2022-May/740485.html

Changes from v1:
 * Remove CPU arch dependency.
 * Remove 32-bit read/write helper function and just use read/writel.
 * Add .is_visible into event attribute to filter out cycle counter event.
 * Update pmiidr matching.
 * Remove read-modify-write on PMCR since the driver only writes to PMCR.E.
 * Assign default cycle event outside the 32-bit PMEVTYPER range.
 * Rework the active event and used counter tracking.

Besar Wicaksono (2):
  perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute

 arch/arm64/configs/defconfig                  |    1 +
 drivers/perf/Kconfig                          |    2 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/coresight_pmu/Kconfig            |   11 +
 drivers/perf/coresight_pmu/Makefile           |    7 +
 .../perf/coresight_pmu/arm_coresight_pmu.c    | 1271 +++++++++++++++++
 .../perf/coresight_pmu/arm_coresight_pmu.h    |  171 +++
 .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  292 ++++
 .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
 9 files changed, 1773 insertions(+)
 create mode 100644 drivers/perf/coresight_pmu/Kconfig
 create mode 100644 drivers/perf/coresight_pmu/Makefile
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h

-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-15 16:30 ` [PATCH v2 " Besar Wicaksono
@ 2022-05-15 16:30   ` Besar Wicaksono
  2022-05-18  7:16     ` kernel test robot
  2022-05-19  8:52     ` Suzuki K Poulose
  2022-05-15 16:30   ` [PATCH v2 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
  2022-05-25  6:48   ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2 siblings, 2 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-15 16:30 UTC (permalink / raw)
  To: robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, suzuki.poulose, treding,
	jonathanh, vsethi, Besar Wicaksono

Add support for ARM CoreSight PMU driver framework and interfaces.
The driver provides generic implementation to operate uncore PMU based
on ARM CoreSight PMU architecture. The driver also provides interface
to get vendor/implementation specific information, for example event
attributes and formating.

The specification used in this implementation can be found below:
 * ACPI Arm Performance Monitoring Unit table:
        https://developer.arm.com/documentation/den0117/latest
 * ARM Coresight PMU architecture:
        https://developer.arm.com/documentation/ihi0091/latest

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 arch/arm64/configs/defconfig                  |    1 +
 drivers/perf/Kconfig                          |    2 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/coresight_pmu/Kconfig            |   11 +
 drivers/perf/coresight_pmu/Makefile           |    6 +
 .../perf/coresight_pmu/arm_coresight_pmu.c    | 1267 +++++++++++++++++
 .../perf/coresight_pmu/arm_coresight_pmu.h    |  171 +++
 7 files changed, 1459 insertions(+)
 create mode 100644 drivers/perf/coresight_pmu/Kconfig
 create mode 100644 drivers/perf/coresight_pmu/Makefile
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 2ca8b1b336d2..8f2120182b25 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1196,6 +1196,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
 CONFIG_PHY_TEGRA_XUSB=y
 CONFIG_PHY_AM654_SERDES=m
 CONFIG_PHY_J721E_WIZ=m
+CONFIG_ARM_CORESIGHT_PMU=y
 CONFIG_ARM_SMMU_V3_PMU=m
 CONFIG_FSL_IMX8_DDR_PMU=m
 CONFIG_QCOM_L2_PMU=y
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 1e2d69453771..c4e7cd5b4162 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
 	  Enable perf support for Marvell DDR Performance monitoring
 	  event on CN10K platform.
 
+source "drivers/perf/coresight_pmu/Kconfig"
+
 endmenu
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 57a279c61df5..4126a04b5583 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
 obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
+obj-$(CONFIG_ARM_CORESIGHT_PMU) += coresight_pmu/
diff --git a/drivers/perf/coresight_pmu/Kconfig b/drivers/perf/coresight_pmu/Kconfig
new file mode 100644
index 000000000000..89174f54c7be
--- /dev/null
+++ b/drivers/perf/coresight_pmu/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+
+config ARM_CORESIGHT_PMU
+	tristate "ARM Coresight PMU"
+	depends on ACPI
+	depends on ACPI_APMT || COMPILE_TEST
+	help
+	  Provides support for Performance Monitoring Unit (PMU) events based on
+	  ARM CoreSight PMU architecture.
\ No newline at end of file
diff --git a/drivers/perf/coresight_pmu/Makefile b/drivers/perf/coresight_pmu/Makefile
new file mode 100644
index 000000000000..a2a7a5fbbc16
--- /dev/null
+++ b/drivers/perf/coresight_pmu/Makefile
@@ -0,0 +1,6 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+#
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
+	arm_coresight_pmu.o
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
new file mode 100644
index 000000000000..36ac77ab85cd
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
@@ -0,0 +1,1267 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM CoreSight PMU driver.
+ *
+ * This driver adds support for uncore PMU based on ARM CoreSight Performance
+ * Monitoring Unit Architecture. The PMU is accessible via MMIO registers and
+ * like other uncore PMUs, it does not support process specific events and
+ * cannot be used in sampling mode.
+ *
+ * This code is based on other uncore PMUs like ARM DSU PMU. It provides a
+ * generic implementation to operate the PMU according to CoreSight PMU
+ * architecture and ACPI ARM PMU table (APMT) documents below:
+ *   - ARM CoreSight PMU architecture document number: ARM IHI 0091 A.a-00bet0.
+ *   - APMT document number: ARM DEN0117.
+ * The description of the PMU, like the PMU device identification, available
+ * events, and configuration options, is vendor specific. The driver provides
+ * interface for vendor specific code to get this information. This allows the
+ * driver to be shared with PMU from different vendors.
+ *
+ * CoreSight PMU devices are named as arm_coresight_pmu<node_id> where <node_id>
+ * is APMT node id. The description of the device, like the identifier,
+ * supported events, and formats can be found in sysfs
+ * /sys/bus/event_source/devices/arm_coresight_pmu<node_id>
+ *
+ * The user should refer to the vendor technical documentation to get details
+ * about the supported events.
+ *
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <linux/ctype.h>
+#include <linux/interrupt.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <acpi/processor.h>
+
+#include "arm_coresight_pmu.h"
+
+#define PMUNAME "arm_coresight_pmu"
+
+#define CORESIGHT_CPUMASK_ATTR(_name, _config)				\
+	CORESIGHT_EXT_ATTR(_name, coresight_pmu_cpumask_show,		\
+			   (unsigned long)_config)
+
+/*
+ * Register offsets based on CoreSight Performance Monitoring Unit Architecture
+ * Document number: ARM-ECM-0640169 00alp6
+ */
+#define PMEVCNTR_LO					0x0
+#define PMEVCNTR_HI					0x4
+#define PMEVTYPER					0x400
+#define PMCCFILTR					0x47C
+#define PMEVFILTR					0xA00
+#define PMCNTENSET					0xC00
+#define PMCNTENCLR					0xC20
+#define PMINTENSET					0xC40
+#define PMINTENCLR					0xC60
+#define PMOVSCLR					0xC80
+#define PMOVSSET					0xCC0
+#define PMCFGR						0xE00
+#define PMCR						0xE04
+#define PMIIDR						0xE08
+
+/* PMCFGR register field */
+#define PMCFGR_NCG_SHIFT				28
+#define PMCFGR_NCG_MASK					0xf
+#define PMCFGR_HDBG					BIT(24)
+#define PMCFGR_TRO					BIT(23)
+#define PMCFGR_SS					BIT(22)
+#define PMCFGR_FZO					BIT(21)
+#define PMCFGR_MSI					BIT(20)
+#define PMCFGR_UEN					BIT(19)
+#define PMCFGR_NA					BIT(17)
+#define PMCFGR_EX					BIT(16)
+#define PMCFGR_CCD					BIT(15)
+#define PMCFGR_CC					BIT(14)
+#define PMCFGR_SIZE_SHIFT				8
+#define PMCFGR_SIZE_MASK				0x3f
+#define PMCFGR_N_SHIFT					0
+#define PMCFGR_N_MASK					0xff
+
+/* PMCR register field */
+#define PMCR_TRO					BIT(11)
+#define PMCR_HDBG					BIT(10)
+#define PMCR_FZO					BIT(9)
+#define PMCR_NA						BIT(8)
+#define PMCR_DP						BIT(5)
+#define PMCR_X						BIT(4)
+#define PMCR_D						BIT(3)
+#define PMCR_C						BIT(2)
+#define PMCR_P						BIT(1)
+#define PMCR_E						BIT(0)
+
+/* PMIIDR register field */
+#define PMIIDR_IMPLEMENTER_MASK				0xFFF
+#define PMIIDR_PRODUCTID_MASK				0xFFF
+#define PMIIDR_PRODUCTID_SHIFT				20
+
+/* Each SET/CLR register supports up to 32 counters. */
+#define CORESIGHT_SET_CLR_REG_COUNTER_NUM		32
+#define CORESIGHT_SET_CLR_REG_COUNTER_SHIFT		5
+
+/* The number of 32-bit SET/CLR register that can be supported. */
+#define CORESIGHT_SET_CLR_REG_MAX_NUM ((PMCNTENCLR - PMCNTENSET) / sizeof(u32))
+
+static_assert((CORESIGHT_SET_CLR_REG_MAX_NUM *
+	       CORESIGHT_SET_CLR_REG_COUNTER_NUM) >=
+	      CORESIGHT_PMU_MAX_HW_CNTRS);
+
+/* Convert counter idx into SET/CLR register number. */
+#define CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx)				\
+	(idx >> CORESIGHT_SET_CLR_REG_COUNTER_SHIFT)
+
+/* Convert counter idx into SET/CLR register bit. */
+#define CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx)				\
+	(idx & (CORESIGHT_SET_CLR_REG_COUNTER_NUM - 1))
+
+#define CORESIGHT_ACTIVE_CPU_MASK			0x0
+#define CORESIGHT_ASSOCIATED_CPU_MASK			0x1
+
+
+/* Check if field f in flags is set with value v */
+#define CHECK_APMT_FLAG(flags, f, v) \
+	((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
+
+static unsigned long coresight_pmu_cpuhp_state;
+
+/*
+ * In CoreSight PMU architecture, all of the MMIO registers are 32-bit except
+ * counter register. The counter register can be implemented as 32-bit or 64-bit
+ * register depending on the value of PMCFGR.SIZE field. For 64-bit access,
+ * single-copy 64-bit atomic support is implementation defined. APMT node flag
+ * is used to identify if the PMU supports 64-bit single copy atomic. If 64-bit
+ * single copy atomic is not supported, the driver treats the register as a pair
+ * of 32-bit register.
+ */
+
+/*
+ * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
+ */
+static u64 read_reg64_hilohi(const void __iomem *addr)
+{
+	u32 val_lo, val_hi;
+	u64 val;
+
+	/* Use high-low-high sequence to avoid tearing */
+	do {
+		val_hi = readl(addr + 4);
+		val_lo = readl(addr);
+	} while (val_hi != readl(addr + 4));
+
+	val = (((u64)val_hi << 32) | val_lo);
+
+	return val;
+}
+
+/* Check if PMU supports 64-bit single copy atomic. */
+static inline bool support_atomic(const struct coresight_pmu *coresight_pmu)
+{
+	return CHECK_APMT_FLAG(coresight_pmu->apmt_node->flags, ATOMIC, SUPP);
+}
+
+/* Check if cycle counter is supported. */
+static inline bool support_cc(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr & PMCFGR_CC);
+}
+
+/* Get counter size. */
+static inline u32 pmcfgr_size(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr >> PMCFGR_SIZE_SHIFT) & PMCFGR_SIZE_MASK;
+}
+
+/* Check if counter is implemented as 64-bit register. */
+static inline bool
+use_64b_counter_reg(const struct coresight_pmu *coresight_pmu)
+{
+	return (pmcfgr_size(coresight_pmu) > 31);
+}
+
+/* Get number of counters, minus one. */
+static inline u32 pmcfgr_n(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr >> PMCFGR_N_SHIFT) & PMCFGR_N_MASK;
+}
+
+ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "event=0x%llx\n",
+			  (unsigned long long)eattr->var);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_event_show);
+
+/* Default event list. */
+static struct attribute *coresight_pmu_event_attrs[] = {
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+struct attribute **
+coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	return coresight_pmu_event_attrs;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_event_attrs);
+
+umode_t coresight_pmu_event_attr_is_visible(struct kobject *kobj,
+					    struct attribute *attr, int unused)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct coresight_pmu *coresight_pmu =
+		to_coresight_pmu(dev_get_drvdata(dev));
+	struct perf_pmu_events_attr *eattr;
+
+	eattr = container_of(attr, typeof(*eattr), attr.attr);
+
+	/* Hide cycle event if not supported */
+	if (!support_cc(coresight_pmu) &&
+	    eattr->id == CORESIGHT_PMU_EVT_CYCLES_DEFAULT) {
+		return 0;
+	}
+
+	return attr->mode;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_event_attr_is_visible);
+
+ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "%s\n", (char *)eattr->var);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_format_show);
+
+static struct attribute *coresight_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	CORESIGHT_FORMAT_FILTER_ATTR,
+	NULL,
+};
+
+struct attribute **
+coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	return coresight_pmu_format_attrs;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_format_attrs);
+
+u32 coresight_pmu_event_type(const struct perf_event *event)
+{
+	return event->attr.config & CORESIGHT_EVENT_MASK;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_event_type);
+
+u32 coresight_pmu_event_filter(const struct perf_event *event)
+{
+	return event->attr.config1 & CORESIGHT_FILTER_MASK;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_event_filter);
+
+static ssize_t coresight_pmu_identifier_show(struct device *dev,
+					     struct device_attribute *attr,
+					     char *page)
+{
+	struct coresight_pmu *coresight_pmu =
+		to_coresight_pmu(dev_get_drvdata(dev));
+
+	return sysfs_emit(page, "%s\n", coresight_pmu->identifier);
+}
+
+static struct device_attribute coresight_pmu_identifier_attr =
+	__ATTR(identifier, 0444, coresight_pmu_identifier_show, NULL);
+
+static struct attribute *coresight_pmu_identifier_attrs[] = {
+	&coresight_pmu_identifier_attr.attr,
+	NULL,
+};
+
+static struct attribute_group coresight_pmu_identifier_attr_group = {
+	.attrs = coresight_pmu_identifier_attrs,
+};
+
+const char *
+coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu)
+{
+	const char *identifier =
+		devm_kasprintf(coresight_pmu->dev, GFP_KERNEL, "%x",
+			       coresight_pmu->impl.pmiidr);
+	return identifier;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_identifier);
+
+static ssize_t coresight_pmu_cpumask_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case CORESIGHT_ACTIVE_CPU_MASK:
+		cpumask = &coresight_pmu->active_cpu;
+		break;
+	case CORESIGHT_ASSOCIATED_CPU_MASK:
+		cpumask = &coresight_pmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+static struct attribute *coresight_pmu_cpumask_attrs[] = {
+	CORESIGHT_CPUMASK_ATTR(cpumask, CORESIGHT_ACTIVE_CPU_MASK),
+	CORESIGHT_CPUMASK_ATTR(associated_cpus, CORESIGHT_ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static struct attribute_group coresight_pmu_cpumask_attr_group = {
+	.attrs = coresight_pmu_cpumask_attrs,
+};
+
+static const struct coresight_pmu_impl_ops default_impl_ops = {
+	.get_event_attrs	= coresight_pmu_get_event_attrs,
+	.get_format_attrs	= coresight_pmu_get_format_attrs,
+	.get_identifier		= coresight_pmu_get_identifier,
+	.is_cc_event		= coresight_pmu_is_cc_event,
+	.event_type		= coresight_pmu_event_type,
+	.event_filter		= coresight_pmu_event_filter,
+	.event_attr_is_visible	= coresight_pmu_event_attr_is_visible
+};
+
+struct impl_match {
+	u32 pmiidr;
+	u32 mask;
+	int (*impl_init_ops)(struct coresight_pmu *coresight_pmu);
+};
+
+static const struct impl_match impl_match[] = {
+	{}
+};
+
+static int coresight_pmu_init_impl_ops(struct coresight_pmu *coresight_pmu)
+{
+	int idx, ret;
+	struct acpi_apmt_node *apmt_node = coresight_pmu->apmt_node;
+	const struct impl_match *match = impl_match;
+
+	/*
+	 * Get PMU implementer and product id from APMT node.
+	 * If APMT node doesn't have implementer/product id, try get it
+	 * from PMIIDR.
+	 */
+	coresight_pmu->impl.pmiidr =
+		(apmt_node->impl_id) ? apmt_node->impl_id :
+				       readl(coresight_pmu->base0 + PMIIDR);
+
+	/* Find implementer specific attribute ops. */
+	for (idx = 0; match->pmiidr; match++, idx++) {
+		if ((match->pmiidr & match->mask) ==
+		    (coresight_pmu->impl.pmiidr & match->mask)) {
+			ret = match->impl_init_ops(coresight_pmu);
+			if (ret)
+				return ret;
+
+			return 0;
+		}
+	}
+
+	/* We don't find implementer specific attribute ops, use default. */
+	coresight_pmu->impl.ops = &default_impl_ops;
+	return 0;
+}
+
+static struct attribute_group *
+coresight_pmu_alloc_event_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	struct attribute_group *event_group;
+	struct device *dev = coresight_pmu->dev;
+
+	event_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!event_group)
+		return NULL;
+
+	event_group->name = "events";
+	event_group->attrs =
+		coresight_pmu->impl.ops->get_event_attrs(coresight_pmu);
+	event_group->is_visible =
+		coresight_pmu->impl.ops->event_attr_is_visible;
+
+	return event_group;
+}
+
+static struct attribute_group *
+coresight_pmu_alloc_format_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	struct attribute_group *format_group;
+	struct device *dev = coresight_pmu->dev;
+
+	format_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!format_group)
+		return NULL;
+
+	format_group->name = "format";
+	format_group->attrs =
+		coresight_pmu->impl.ops->get_format_attrs(coresight_pmu);
+
+	return format_group;
+}
+
+static struct attribute_group **
+coresight_pmu_alloc_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	const struct coresight_pmu_impl_ops *impl_ops;
+	struct attribute_group **attr_groups = NULL;
+	struct device *dev = coresight_pmu->dev;
+	int ret;
+
+	ret = coresight_pmu_init_impl_ops(coresight_pmu);
+	if (ret)
+		return NULL;
+
+	impl_ops = coresight_pmu->impl.ops;
+
+	coresight_pmu->identifier = impl_ops->get_identifier(coresight_pmu);
+
+	attr_groups = devm_kzalloc(dev, 5 * sizeof(struct attribute_group *),
+				   GFP_KERNEL);
+	if (!attr_groups)
+		return NULL;
+
+	attr_groups[0] = coresight_pmu_alloc_event_attr_group(coresight_pmu);
+	attr_groups[1] = coresight_pmu_alloc_format_attr_group(coresight_pmu);
+	attr_groups[2] = &coresight_pmu_identifier_attr_group;
+	attr_groups[3] = &coresight_pmu_cpumask_attr_group;
+
+	return attr_groups;
+}
+
+static inline void
+coresight_pmu_reset_counters(struct coresight_pmu *coresight_pmu)
+{
+	u32 pmcr = 0;
+
+	pmcr |= PMCR_P;
+	pmcr |= PMCR_C;
+	writel(pmcr, coresight_pmu->base0 + PMCR);
+}
+
+static inline void
+coresight_pmu_start_counters(struct coresight_pmu *coresight_pmu)
+{
+	u32 pmcr;
+
+	pmcr = PMCR_E;
+	writel(pmcr, coresight_pmu->base0 + PMCR);
+}
+
+static inline void
+coresight_pmu_stop_counters(struct coresight_pmu *coresight_pmu)
+{
+	u32 pmcr;
+
+	pmcr = 0;
+	writel(pmcr, coresight_pmu->base0 + PMCR);
+}
+
+static void coresight_pmu_enable(struct pmu *pmu)
+{
+	bool disabled;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+
+	disabled = bitmap_empty(coresight_pmu->hw_events.used_ctrs,
+				coresight_pmu->num_logical_counters);
+
+	if (disabled)
+		return;
+
+	coresight_pmu_start_counters(coresight_pmu);
+}
+
+static void coresight_pmu_disable(struct pmu *pmu)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+
+	coresight_pmu_stop_counters(coresight_pmu);
+}
+
+bool coresight_pmu_is_cc_event(const struct perf_event *event)
+{
+	return (event->attr.config == CORESIGHT_PMU_EVT_CYCLES_DEFAULT);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_is_cc_event);
+
+static int
+coresight_pmu_get_event_idx(struct coresight_pmu_hw_events *hw_events,
+			    struct perf_event *event)
+{
+	int idx;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (support_cc(coresight_pmu)) {
+		if (coresight_pmu->impl.ops->is_cc_event(event)) {
+			/* Search for available cycle counter. */
+			if (test_and_set_bit(coresight_pmu->cc_logical_idx,
+					     hw_events->used_ctrs))
+				return -EAGAIN;
+
+			return coresight_pmu->cc_logical_idx;
+		}
+
+		/*
+		 * Search a regular counter from the used counter bitmap.
+		 * The cycle counter divides the bitmap into two parts. Search
+		 * the first then second half to exclude the cycle counter bit.
+		 */
+		idx = find_first_zero_bit(hw_events->used_ctrs,
+					  coresight_pmu->cc_logical_idx);
+		if (idx >= coresight_pmu->cc_logical_idx) {
+			idx = find_next_zero_bit(
+				hw_events->used_ctrs,
+				coresight_pmu->num_logical_counters,
+				coresight_pmu->cc_logical_idx + 1);
+		}
+	} else {
+		idx = find_first_zero_bit(hw_events->used_ctrs,
+					  coresight_pmu->num_logical_counters);
+	}
+
+	if (idx >= coresight_pmu->num_logical_counters)
+		return -EAGAIN;
+
+	set_bit(idx, hw_events->used_ctrs);
+
+	return idx;
+}
+
+static bool
+coresight_pmu_validate_event(struct pmu *pmu,
+			     struct coresight_pmu_hw_events *hw_events,
+			     struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return false;
+
+	return (coresight_pmu_get_event_idx(hw_events, event) >= 0);
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool coresight_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct coresight_pmu_hw_events fake_hw_events;
+
+	if (event->group_leader == event)
+		return true;
+
+	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
+
+	if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events, leader))
+		return false;
+
+	for_each_sibling_event(sibling, leader) {
+		if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events,
+						  sibling))
+			return false;
+	}
+
+	return coresight_pmu_validate_event(event->pmu, &fake_hw_events, event);
+}
+
+static int coresight_pmu_event_init(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu;
+	struct hw_perf_event *hwc = &event->hw;
+
+	coresight_pmu = to_coresight_pmu(event->pmu);
+
+	/*
+	 * Following other "uncore" PMUs, we do not support sampling mode or
+	 * attach to a task (per-process mode).
+	 */
+	if (is_sampling_event(event)) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Make sure the CPU assignment is on one of the CPUs associated with
+	 * this PMU.
+	 */
+	if (!cpumask_test_cpu(event->cpu, &coresight_pmu->associated_cpus)) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Requested cpu is not associated with the PMU\n");
+		return -EINVAL;
+	}
+
+	/* Enforce the current active CPU to handle the events in this PMU. */
+	event->cpu = cpumask_first(&coresight_pmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (!coresight_pmu_validate_group(event))
+		return -EINVAL;
+
+	/*
+	 * The logical counter id is tracked with hw_perf_event.extra_reg.idx.
+	 * The physical counter id is tracked with hw_perf_event.idx.
+	 * We don't assign an index until we actually place the event onto
+	 * hardware. Use -1 to signify that we haven't decided where to put it
+	 * yet.
+	 */
+	hwc->idx = -1;
+	hwc->extra_reg.idx = -1;
+	hwc->config_base = coresight_pmu->impl.ops->event_type(event);
+
+	return 0;
+}
+
+static inline u32 counter_offset(u32 reg_sz, u32 ctr_idx)
+{
+	return (PMEVCNTR_LO + (reg_sz * ctr_idx));
+}
+
+static void coresight_pmu_write_counter(struct perf_event *event, u64 val)
+{
+	u32 offset;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (use_64b_counter_reg(coresight_pmu)) {
+		offset = counter_offset(sizeof(u64), event->hw.idx);
+
+		writeq(val, coresight_pmu->base1 + offset);
+	} else {
+		offset = counter_offset(sizeof(u32), event->hw.idx);
+
+		writel(lower_32_bits(val), coresight_pmu->base1 + offset);
+	}
+}
+
+static u64 coresight_pmu_read_counter(struct perf_event *event)
+{
+	u32 offset;
+	const void __iomem *counter_addr;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (use_64b_counter_reg(coresight_pmu)) {
+		offset = counter_offset(sizeof(u64), event->hw.idx);
+		counter_addr = coresight_pmu->base1 + offset;
+
+		return support_atomic(coresight_pmu) ?
+			       readq(counter_addr) :
+			       read_reg64_hilohi(counter_addr);
+	}
+
+	offset = counter_offset(sizeof(u32), event->hw.idx);
+	return readl(coresight_pmu->base1 + offset);
+}
+
+/*
+ * coresight_pmu_set_event_period: Set the period for the counter.
+ *
+ * To handle cases of extreme interrupt latency, we program
+ * the counter with half of the max count for the counters.
+ */
+static void coresight_pmu_set_event_period(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	u64 val = GENMASK_ULL(pmcfgr_size(coresight_pmu), 0) >> 1;
+
+	local64_set(&event->hw.prev_count, val);
+	coresight_pmu_write_counter(event, val);
+}
+
+static void coresight_pmu_enable_counter(struct coresight_pmu *coresight_pmu,
+					 int idx)
+{
+	u32 reg_id, reg_bit, inten_off, cnten_off;
+
+	reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
+	reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
+
+	inten_off = PMINTENSET + (4 * reg_id);
+	cnten_off = PMCNTENSET + (4 * reg_id);
+
+	writel(BIT(reg_bit), coresight_pmu->base0 + inten_off);
+	writel(BIT(reg_bit), coresight_pmu->base0 + cnten_off);
+}
+
+static void coresight_pmu_disable_counter(struct coresight_pmu *coresight_pmu,
+					  int idx)
+{
+	u32 reg_id, reg_bit, inten_off, cnten_off;
+
+	reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
+	reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
+
+	inten_off = PMINTENCLR + (4 * reg_id);
+	cnten_off = PMCNTENCLR + (4 * reg_id);
+
+	writel(BIT(reg_bit), coresight_pmu->base0 + cnten_off);
+	writel(BIT(reg_bit), coresight_pmu->base0 + inten_off);
+}
+
+static void coresight_pmu_event_update(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u64 delta, prev, now;
+
+	do {
+		prev = local64_read(&hwc->prev_count);
+		now = coresight_pmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
+
+	delta = (now - prev) & GENMASK_ULL(pmcfgr_size(coresight_pmu), 0);
+	local64_add(delta, &event->count);
+}
+
+static inline void coresight_pmu_set_event(struct coresight_pmu *coresight_pmu,
+					   struct hw_perf_event *hwc)
+{
+	u32 offset = PMEVTYPER + (4 * hwc->idx);
+
+	writel(hwc->config_base, coresight_pmu->base0 + offset);
+}
+
+static inline void
+coresight_pmu_set_ev_filter(struct coresight_pmu *coresight_pmu,
+			    struct hw_perf_event *hwc, u32 filter)
+{
+	u32 offset = PMEVFILTR + (4 * hwc->idx);
+
+	writel(filter, coresight_pmu->base0 + offset);
+}
+
+static inline void
+coresight_pmu_set_cc_filter(struct coresight_pmu *coresight_pmu, u32 filter)
+{
+	u32 offset = PMCCFILTR;
+
+	writel(filter, coresight_pmu->base0 + offset);
+}
+
+static void coresight_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u32 filter;
+
+	/* We always reprogram the counter */
+	if (pmu_flags & PERF_EF_RELOAD)
+		WARN_ON(!(hwc->state & PERF_HES_UPTODATE));
+
+	coresight_pmu_set_event_period(event);
+
+	filter = coresight_pmu->impl.ops->event_filter(event);
+
+	if (event->hw.extra_reg.idx == coresight_pmu->cc_logical_idx) {
+		coresight_pmu_set_cc_filter(coresight_pmu, filter);
+	} else {
+		coresight_pmu_set_event(coresight_pmu, hwc);
+		coresight_pmu_set_ev_filter(coresight_pmu, hwc, filter);
+	}
+
+	hwc->state = 0;
+
+	coresight_pmu_enable_counter(coresight_pmu, hwc->idx);
+}
+
+static void coresight_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->state & PERF_HES_STOPPED)
+		return;
+
+	coresight_pmu_disable_counter(coresight_pmu, hwc->idx);
+	coresight_pmu_event_update(event);
+
+	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static inline u32 to_phys_idx(struct coresight_pmu *coresight_pmu, u32 idx)
+{
+	return (idx == coresight_pmu->cc_logical_idx) ?
+		       CORESIGHT_PMU_IDX_CCNTR : idx;
+}
+
+static int coresight_pmu_add(struct perf_event *event, int flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct coresight_pmu_hw_events *hw_events = &coresight_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &coresight_pmu->associated_cpus)))
+		return -ENOENT;
+
+	idx = coresight_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hw_events->events[idx] = event;
+	hwc->idx = to_phys_idx(coresight_pmu, idx);
+	hwc->extra_reg.idx = idx;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		coresight_pmu_start(event, PERF_EF_RELOAD);
+
+	/* Propagate changes to the userspace mapping. */
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static void coresight_pmu_del(struct perf_event *event, int flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct coresight_pmu_hw_events *hw_events = &coresight_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->extra_reg.idx;
+
+	coresight_pmu_stop(event, PERF_EF_UPDATE);
+
+	hw_events->events[idx] = NULL;
+
+	clear_bit(idx, hw_events->used_ctrs);
+
+	perf_event_update_userpage(event);
+}
+
+static void coresight_pmu_read(struct perf_event *event)
+{
+	coresight_pmu_event_update(event);
+}
+
+static int coresight_pmu_alloc(struct platform_device *pdev,
+			       struct coresight_pmu **coresight_pmu)
+{
+	struct acpi_apmt_node *apmt_node;
+	struct device *dev;
+	struct coresight_pmu *pmu;
+
+	dev = &pdev->dev;
+	apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
+	if (!apmt_node) {
+		dev_err(dev, "failed to get APMT node\n");
+		return -ENOMEM;
+	}
+
+	pmu = devm_kzalloc(dev, sizeof(*pmu), GFP_KERNEL);
+	if (!pmu)
+		return -ENOMEM;
+
+	*coresight_pmu = pmu;
+
+	pmu->dev = dev;
+	pmu->apmt_node = apmt_node;
+	pmu->name =
+		devm_kasprintf(dev, GFP_KERNEL, PMUNAME "%u", apmt_node->id);
+
+	platform_set_drvdata(pdev, coresight_pmu);
+
+	return 0;
+}
+
+static int coresight_pmu_init_mmio(struct coresight_pmu *coresight_pmu)
+{
+	struct device *dev;
+	struct platform_device *pdev;
+	struct acpi_apmt_node *apmt_node;
+
+	dev = coresight_pmu->dev;
+	pdev = to_platform_device(dev);
+	apmt_node = coresight_pmu->apmt_node;
+
+	/* Base address for page 0. */
+	coresight_pmu->base0 = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(coresight_pmu->base0)) {
+		dev_err(dev, "ioremap failed for page-0 resource\n");
+		return PTR_ERR(coresight_pmu->base0);
+	}
+
+	/* Base address for page 1 if supported. Otherwise point it to page 0. */
+	coresight_pmu->base1 = coresight_pmu->base0;
+	if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
+		coresight_pmu->base1 = devm_platform_ioremap_resource(pdev, 1);
+		if (IS_ERR(coresight_pmu->base1)) {
+			dev_err(dev, "ioremap failed for page-1 resource\n");
+			return PTR_ERR(coresight_pmu->base1);
+		}
+	}
+
+	coresight_pmu->pmcfgr = readl(coresight_pmu->base0 + PMCFGR);
+
+	coresight_pmu->num_logical_counters = pmcfgr_n(coresight_pmu) + 1;
+
+	coresight_pmu->cc_logical_idx = CORESIGHT_PMU_MAX_HW_CNTRS;
+
+	if (support_cc(coresight_pmu)) {
+		/*
+		 * The last logical counter is mapped to cycle counter if
+		 * there is a gap between regular and cycle counter. Otherwise,
+		 * logical and physical have 1-to-1 mapping.
+		 */
+		coresight_pmu->cc_logical_idx =
+			(coresight_pmu->num_logical_counters <=
+			 CORESIGHT_PMU_IDX_CCNTR) ?
+				coresight_pmu->num_logical_counters - 1 :
+				CORESIGHT_PMU_IDX_CCNTR;
+	}
+
+	coresight_pmu->num_set_clr_reg =
+		DIV_ROUND_UP(coresight_pmu->num_logical_counters,
+			 CORESIGHT_SET_CLR_REG_COUNTER_NUM);
+
+	coresight_pmu->hw_events.events =
+		devm_kzalloc(dev,
+			     sizeof(*coresight_pmu->hw_events.events) *
+				     coresight_pmu->num_logical_counters,
+			     GFP_KERNEL);
+
+	if (!coresight_pmu->hw_events.events)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static inline int
+coresight_pmu_get_reset_overflow(struct coresight_pmu *coresight_pmu,
+				 u32 *pmovs)
+{
+	int i;
+	u32 pmovclr_offset = PMOVSCLR;
+	u32 has_overflowed = 0;
+
+	for (i = 0; i < coresight_pmu->num_set_clr_reg; ++i) {
+		pmovs[i] = readl(coresight_pmu->base1 + pmovclr_offset);
+		has_overflowed |= pmovs[i];
+		writel(pmovs[i], coresight_pmu->base1 + pmovclr_offset);
+		pmovclr_offset += sizeof(u32);
+	}
+
+	return has_overflowed != 0;
+}
+
+static irqreturn_t coresight_pmu_handle_irq(int irq_num, void *dev)
+{
+	int idx, has_overflowed;
+	struct perf_event *event;
+	struct coresight_pmu *coresight_pmu = dev;
+	u32 pmovs[CORESIGHT_SET_CLR_REG_MAX_NUM] = { 0 };
+	bool handled = false;
+
+	coresight_pmu_stop_counters(coresight_pmu);
+
+	has_overflowed = coresight_pmu_get_reset_overflow(coresight_pmu, pmovs);
+	if (!has_overflowed)
+		goto done;
+
+	for_each_set_bit(idx, coresight_pmu->hw_events.used_ctrs,
+			coresight_pmu->num_logical_counters) {
+		event = coresight_pmu->hw_events.events[idx];
+
+		if (!event)
+			continue;
+
+		if (!test_bit(event->hw.idx, (unsigned long *)pmovs))
+			continue;
+
+		coresight_pmu_event_update(event);
+		coresight_pmu_set_event_period(event);
+
+		handled = true;
+	}
+
+done:
+	coresight_pmu_start_counters(coresight_pmu);
+	return IRQ_RETVAL(handled);
+}
+
+static int coresight_pmu_request_irq(struct coresight_pmu *coresight_pmu)
+{
+	int irq, ret;
+	struct device *dev;
+	struct platform_device *pdev;
+	struct acpi_apmt_node *apmt_node;
+
+	dev = coresight_pmu->dev;
+	pdev = to_platform_device(dev);
+	apmt_node = coresight_pmu->apmt_node;
+
+	/* Skip IRQ request if the PMU does not support overflow interrupt. */
+	if (apmt_node->ovflw_irq == 0)
+		return 0;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0)
+		return irq;
+
+	ret = devm_request_irq(dev, irq, coresight_pmu_handle_irq,
+			       IRQF_NOBALANCING | IRQF_NO_THREAD, dev_name(dev),
+			       coresight_pmu);
+	if (ret) {
+		dev_err(dev, "Could not request IRQ %d\n", irq);
+		return ret;
+	}
+
+	coresight_pmu->irq = irq;
+
+	return 0;
+}
+
+static inline int coresight_pmu_find_cpu_container(int cpu, u32 container_uid)
+{
+	u32 acpi_uid;
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct acpi_device *acpi_dev = ACPI_COMPANION(cpu_dev);
+	int level = 0;
+
+	if (!cpu_dev)
+		return -ENODEV;
+
+	while (acpi_dev) {
+		if (!strcmp(acpi_device_hid(acpi_dev),
+			    ACPI_PROCESSOR_CONTAINER_HID) &&
+		    !kstrtouint(acpi_device_uid(acpi_dev), 0, &acpi_uid) &&
+		    acpi_uid == container_uid)
+			return 0;
+
+		acpi_dev = acpi_dev->parent;
+		level++;
+	}
+
+	return -ENODEV;
+}
+
+static int coresight_pmu_get_cpus(struct coresight_pmu *coresight_pmu)
+{
+	struct acpi_apmt_node *apmt_node;
+	int affinity_flag;
+	int cpu;
+
+	apmt_node = coresight_pmu->apmt_node;
+	affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
+
+	if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
+		for_each_possible_cpu(cpu) {
+			if (apmt_node->proc_affinity ==
+			    get_acpi_id_for_cpu(cpu)) {
+				cpumask_set_cpu(
+					cpu, &coresight_pmu->associated_cpus);
+				break;
+			}
+		}
+	} else {
+		for_each_possible_cpu(cpu) {
+			if (coresight_pmu_find_cpu_container(
+				    cpu, apmt_node->proc_affinity))
+				continue;
+
+			cpumask_set_cpu(cpu, &coresight_pmu->associated_cpus);
+		}
+	}
+
+	return 0;
+}
+
+static int coresight_pmu_register_pmu(struct coresight_pmu *coresight_pmu)
+{
+	int ret;
+	struct attribute_group **attr_groups;
+
+	attr_groups = coresight_pmu_alloc_attr_group(coresight_pmu);
+	if (!attr_groups) {
+		ret = -ENOMEM;
+		return ret;
+	}
+
+	ret = cpuhp_state_add_instance(coresight_pmu_cpuhp_state,
+				       &coresight_pmu->cpuhp_node);
+	if (ret)
+		return ret;
+
+	coresight_pmu->pmu = (struct pmu){
+		.task_ctx_nr	= perf_invalid_context,
+		.module		= THIS_MODULE,
+		.pmu_enable	= coresight_pmu_enable,
+		.pmu_disable	= coresight_pmu_disable,
+		.event_init	= coresight_pmu_event_init,
+		.add		= coresight_pmu_add,
+		.del		= coresight_pmu_del,
+		.start		= coresight_pmu_start,
+		.stop		= coresight_pmu_stop,
+		.read		= coresight_pmu_read,
+		.attr_groups	= (const struct attribute_group **)attr_groups,
+		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
+	};
+
+	/* Hardware counter init */
+	coresight_pmu_stop_counters(coresight_pmu);
+	coresight_pmu_reset_counters(coresight_pmu);
+
+	ret = perf_pmu_register(&coresight_pmu->pmu, coresight_pmu->name, -1);
+	if (ret) {
+		cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
+					    &coresight_pmu->cpuhp_node);
+	}
+
+	return ret;
+}
+
+static int coresight_pmu_device_probe(struct platform_device *pdev)
+{
+	int ret;
+	struct coresight_pmu *coresight_pmu;
+
+	ret = coresight_pmu_alloc(pdev, &coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_init_mmio(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_request_irq(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_get_cpus(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_register_pmu(coresight_pmu);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int coresight_pmu_device_remove(struct platform_device *pdev)
+{
+	struct coresight_pmu *coresight_pmu = platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&coresight_pmu->pmu);
+	cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
+				    &coresight_pmu->cpuhp_node);
+
+	return 0;
+}
+
+static struct platform_driver coresight_pmu_driver = {
+	.driver = {
+			.name = "arm-coresight-pmu",
+			.suppress_bind_attrs = true,
+		},
+	.probe = coresight_pmu_device_probe,
+	.remove = coresight_pmu_device_remove,
+};
+
+static void coresight_pmu_set_active_cpu(int cpu,
+					 struct coresight_pmu *coresight_pmu)
+{
+	cpumask_set_cpu(cpu, &coresight_pmu->active_cpu);
+	WARN_ON(irq_set_affinity(coresight_pmu->irq,
+				 &coresight_pmu->active_cpu));
+}
+
+static int coresight_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct coresight_pmu *coresight_pmu =
+		hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
+
+	if (!cpumask_test_cpu(cpu, &coresight_pmu->associated_cpus))
+		return 0;
+
+	/* If the PMU is already managed, there is nothing to do */
+	if (!cpumask_empty(&coresight_pmu->active_cpu))
+		return 0;
+
+	/* Use this CPU for event counting */
+	coresight_pmu_set_active_cpu(cpu, coresight_pmu);
+
+	return 0;
+}
+
+static int coresight_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	int dst;
+	struct cpumask online_supported;
+
+	struct coresight_pmu *coresight_pmu =
+		hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
+
+	/* Nothing to do if this CPU doesn't own the PMU */
+	if (!cpumask_test_and_clear_cpu(cpu, &coresight_pmu->active_cpu))
+		return 0;
+
+	/* Choose a new CPU to migrate ownership of the PMU to */
+	cpumask_and(&online_supported, &coresight_pmu->associated_cpus,
+		    cpu_online_mask);
+	dst = cpumask_any_but(&online_supported, cpu);
+	if (dst >= nr_cpu_ids)
+		return 0;
+
+	/* Use this CPU for event counting */
+	perf_pmu_migrate_context(&coresight_pmu->pmu, cpu, dst);
+	coresight_pmu_set_active_cpu(dst, coresight_pmu);
+
+	return 0;
+}
+
+static int __init coresight_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, PMUNAME,
+				      coresight_pmu_cpu_online,
+				      coresight_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+	coresight_pmu_cpuhp_state = ret;
+	return platform_driver_register(&coresight_pmu_driver);
+}
+
+static void __exit coresight_pmu_exit(void)
+{
+	platform_driver_unregister(&coresight_pmu_driver);
+	cpuhp_remove_multi_state(coresight_pmu_cpuhp_state);
+}
+
+module_init(coresight_pmu_init);
+module_exit(coresight_pmu_exit);
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.h b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
new file mode 100644
index 000000000000..963f7483dc36
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
@@ -0,0 +1,171 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * ARM CoreSight PMU driver.
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+#ifndef __ARM_CORESIGHT_PMU_H__
+#define __ARM_CORESIGHT_PMU_H__
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#define to_coresight_pmu(p) (container_of(p, struct coresight_pmu, pmu))
+
+#define CORESIGHT_EXT_ATTR(_name, _func, _config)			\
+	(&((struct dev_ext_attribute[]){				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define CORESIGHT_FORMAT_ATTR(_name, _config)				\
+	CORESIGHT_EXT_ATTR(_name, coresight_pmu_sysfs_format_show,	\
+			   (char *)_config)
+
+#define CORESIGHT_EVENT_ATTR(_name, _config)				\
+	PMU_EVENT_ATTR_ID(_name, coresight_pmu_sysfs_event_show, _config)
+
+
+/* Default event id mask */
+#define CORESIGHT_EVENT_MASK				0xFFFFFFFFULL
+
+/* Default filter value mask */
+#define CORESIGHT_FILTER_MASK				0xFFFFFFFFULL
+
+/* Default event format */
+#define CORESIGHT_FORMAT_EVENT_ATTR CORESIGHT_FORMAT_ATTR(event, "config:0-32")
+
+/* Default filter format */
+#define CORESIGHT_FORMAT_FILTER_ATTR                                           \
+	CORESIGHT_FORMAT_ATTR(filter, "config1:0-31")
+
+/*
+ * This is the default event number for cycle count, if supported, since the
+ * ARM Coresight PMU specification does not define a standard event code
+ * for cycle count.
+ */
+#define CORESIGHT_PMU_EVT_CYCLES_DEFAULT (0x1ULL << 32)
+
+/*
+ * The ARM Coresight PMU supports up to 256 event counters.
+ * If the counters are larger-than 32-bits, then the PMU includes at
+ * most 128 counters.
+ */
+#define CORESIGHT_PMU_MAX_HW_CNTRS 256
+
+/* The cycle counter, if implemented, is located at counter[31]. */
+#define CORESIGHT_PMU_IDX_CCNTR 31
+
+struct coresight_pmu;
+
+/* This tracks the events assigned to each counter in the PMU. */
+struct coresight_pmu_hw_events {
+	/* The events that are active on the PMU for a given logical index. */
+	struct perf_event **events;
+
+	/*
+	 * Each bit indicates a logical counter is being used (or not) for an
+	 * event. If cycle counter is supported and there is a gap between
+	 * regular and cycle counter, the last logical counter is mapped to
+	 * cycle counter. Otherwise, logical and physical have 1-to-1 mapping.
+	 */
+	DECLARE_BITMAP(used_ctrs, CORESIGHT_PMU_MAX_HW_CNTRS);
+};
+
+/* Contains ops to query vendor/implementer specific attribute. */
+struct coresight_pmu_impl_ops {
+	/* Get event attributes */
+	struct attribute **(*get_event_attrs)(
+		const struct coresight_pmu *coresight_pmu);
+	/* Get format attributes */
+	struct attribute **(*get_format_attrs)(
+		const struct coresight_pmu *coresight_pmu);
+	/* Get string identifier */
+	const char *(*get_identifier)(const struct coresight_pmu *coresight_pmu);
+	/* Check if the event corresponds to cycle count event */
+	bool (*is_cc_event)(const struct perf_event *event);
+	/* Decode event type/id from configs */
+	u32 (*event_type)(const struct perf_event *event);
+	/* Decode filter value from configs */
+	u32 (*event_filter)(const struct perf_event *event);
+	/* Hide/show unsupported events */
+	umode_t (*event_attr_is_visible)(struct kobject *kobj,
+					 struct attribute *attr, int unused);
+};
+
+/* Vendor/implementer descriptor. */
+struct coresight_pmu_impl {
+	u32 pmiidr;
+	const struct coresight_pmu_impl_ops *ops;
+};
+
+/* Coresight PMU descriptor. */
+struct coresight_pmu {
+	struct pmu pmu;
+	struct device *dev;
+	struct acpi_apmt_node *apmt_node;
+	const char *name;
+	const char *identifier;
+	void __iomem *base0;
+	void __iomem *base1;
+	int irq;
+	cpumask_t associated_cpus;
+	cpumask_t active_cpu;
+	struct hlist_node cpuhp_node;
+
+	u32 pmcfgr;
+	u32 num_logical_counters;
+	u32 num_set_clr_reg;
+	int cc_logical_idx;
+
+	struct coresight_pmu_hw_events hw_events;
+
+	struct coresight_pmu_impl impl;
+};
+
+/* Default function to show event attribute in sysfs. */
+ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
+				       struct device_attribute *attr,
+				       char *buf);
+
+/* Default function to show format attribute in sysfs. */
+ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf);
+
+/* Get the default Coresight PMU event attributes. */
+struct attribute **
+coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu);
+
+/* Get the default Coresight PMU format attributes. */
+struct attribute **
+coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu);
+
+/* Get the default Coresight PMU device identifier. */
+const char *
+coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu);
+
+/* Default function to query if an event is a cycle counter event. */
+bool coresight_pmu_is_cc_event(const struct perf_event *event);
+
+/* Default function to query the type/id of an event. */
+u32 coresight_pmu_event_type(const struct perf_event *event);
+
+/* Default function to query the filter value of an event. */
+u32 coresight_pmu_event_filter(const struct perf_event *event);
+
+/* Default function that hides (default) cycle event id if not supported. */
+umode_t coresight_pmu_event_attr_is_visible(struct kobject *kobj,
+					    struct attribute *attr, int unused);
+
+#endif /* __ARM_CORESIGHT_PMU_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v2 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
  2022-05-15 16:30 ` [PATCH v2 " Besar Wicaksono
  2022-05-15 16:30   ` [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
@ 2022-05-15 16:30   ` Besar Wicaksono
  2022-05-25  6:48   ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-15 16:30 UTC (permalink / raw)
  To: robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, suzuki.poulose, treding,
	jonathanh, vsethi, Besar Wicaksono

Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
Fabric (MCF) PMU attributes for CoreSight PMU implementation in
NVIDIA devices.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 drivers/perf/coresight_pmu/Makefile           |   3 +-
 .../perf/coresight_pmu/arm_coresight_pmu.c    |   4 +
 .../coresight_pmu/arm_coresight_pmu_nvidia.c  | 292 ++++++++++++++++++
 .../coresight_pmu/arm_coresight_pmu_nvidia.h  |  17 +
 4 files changed, 315 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h

diff --git a/drivers/perf/coresight_pmu/Makefile b/drivers/perf/coresight_pmu/Makefile
index a2a7a5fbbc16..181b1b0dbaa1 100644
--- a/drivers/perf/coresight_pmu/Makefile
+++ b/drivers/perf/coresight_pmu/Makefile
@@ -3,4 +3,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
-	arm_coresight_pmu.o
+	arm_coresight_pmu.o \
+	arm_coresight_pmu_nvidia.o
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
index 36ac77ab85cd..85ef653e238d 100644
--- a/drivers/perf/coresight_pmu/arm_coresight_pmu.c
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
@@ -40,6 +40,7 @@
 #include <acpi/processor.h>
 
 #include "arm_coresight_pmu.h"
+#include "arm_coresight_pmu_nvidia.h"
 
 #define PMUNAME "arm_coresight_pmu"
 
@@ -351,6 +352,9 @@ struct impl_match {
 };
 
 static const struct impl_match impl_match[] = {
+	{ .pmiidr = 0x36B,
+	  .mask = PMIIDR_IMPLEMENTER_MASK,
+	  .impl_init_ops = nv_coresight_init_ops },
 	{}
 };
 
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
new file mode 100644
index 000000000000..21f96e95b2a6
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
@@ -0,0 +1,292 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+/* Support for NVIDIA specific attributes. */
+
+#include "arm_coresight_pmu_nvidia.h"
+
+#define NV_MCF_PCIE_PORT_COUNT		10ULL
+#define NV_MCF_PCIE_FILTER_ID_MASK	((1ULL << NV_MCF_PCIE_PORT_COUNT) - 1)
+
+#define NV_MCF_GPU_PORT_COUNT		2ULL
+#define NV_MCF_GPU_FILTER_ID_MASK	((1ULL << NV_MCF_GPU_PORT_COUNT) - 1)
+
+#define NV_MCF_NVLINK_PORT_COUNT	4ULL
+#define NV_MCF_NVLINK_FILTER_ID_MASK	((1ULL << NV_MCF_NVLINK_PORT_COUNT) - 1)
+
+#define PMIIDR_PRODUCTID_MASK		0xFFF
+#define PMIIDR_PRODUCTID_SHIFT		20
+
+#define to_nv_pmu_impl(coresight_pmu)	\
+	(container_of(coresight_pmu->impl.ops, struct nv_pmu_impl, ops))
+
+#define CORESIGHT_EVENT_ATTR_4_INNER(_pref, _num, _suff, _config)	\
+	CORESIGHT_EVENT_ATTR(_pref##_num##_suff, _config)
+
+#define CORESIGHT_EVENT_ATTR_4(_pref, _suff, _config)			\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _0_, _suff, _config),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _1_, _suff, _config + 1),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _2_, _suff, _config + 2),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _3_, _suff, _config + 3)
+
+struct nv_pmu_impl {
+	struct coresight_pmu_impl_ops ops;
+	const char *identifier;
+	u32 filter_mask;
+	struct attribute **event_attr;
+	struct attribute **format_attr;
+};
+
+static struct attribute *scf_pmu_event_attrs[] = {
+	CORESIGHT_EVENT_ATTR(bus_cycles,			0x1d),
+
+	CORESIGHT_EVENT_ATTR(scf_cache_allocate,		0xF0),
+	CORESIGHT_EVENT_ATTR(scf_cache_refill,			0xF1),
+	CORESIGHT_EVENT_ATTR(scf_cache,				0xF2),
+	CORESIGHT_EVENT_ATTR(scf_cache_wb,			0xF3),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_data,			0x101),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_rsp,			0x105),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_data,			0x109),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_rsp,			0x10d),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_data,		0x111),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_outstanding,		0x115),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_outstanding,		0x119),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_outstanding,		0x11d),
+	CORESIGHT_EVENT_ATTR_4(socket, wr_outstanding,		0x121),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_outstanding,		0x125),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_outstanding,		0x129),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_access,		0x12d),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_access,		0x131),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_access,		0x135),
+	CORESIGHT_EVENT_ATTR_4(socket, wr_access,		0x139),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_access,		0x13d),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_access,		0x141),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_data,		0x145),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_access,		0x149),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_access,		0x14d),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_outstanding,	0x151),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_outstanding,	0x155),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_data,		0x159),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_access,		0x15d),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_access,		0x161),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_outstanding,		0x165),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_outstanding,		0x169),
+
+	CORESIGHT_EVENT_ATTR(gmem_rd_data,			0x16d),
+	CORESIGHT_EVENT_ATTR(gmem_rd_access,			0x16e),
+	CORESIGHT_EVENT_ATTR(gmem_rd_outstanding,		0x16f),
+	CORESIGHT_EVENT_ATTR(gmem_dl_rsp,			0x170),
+	CORESIGHT_EVENT_ATTR(gmem_dl_access,			0x171),
+	CORESIGHT_EVENT_ATTR(gmem_dl_outstanding,		0x172),
+	CORESIGHT_EVENT_ATTR(gmem_wb_data,			0x173),
+	CORESIGHT_EVENT_ATTR(gmem_wb_access,			0x174),
+	CORESIGHT_EVENT_ATTR(gmem_wb_outstanding,		0x175),
+	CORESIGHT_EVENT_ATTR(gmem_ev_rsp,			0x176),
+	CORESIGHT_EVENT_ATTR(gmem_ev_access,			0x177),
+	CORESIGHT_EVENT_ATTR(gmem_ev_outstanding,		0x178),
+	CORESIGHT_EVENT_ATTR(gmem_wr_data,			0x179),
+	CORESIGHT_EVENT_ATTR(gmem_wr_outstanding,		0x17a),
+	CORESIGHT_EVENT_ATTR(gmem_wr_access,			0x17b),
+
+	CORESIGHT_EVENT_ATTR_4(socket, wr_data,			0x17c),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_data,		0x180),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_data,		0x184),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_access,		0x188),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_outstanding,	0x18c),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_data,		0x190),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_data,		0x194),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_access,		0x198),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_outstanding,		0x19c),
+
+	CORESIGHT_EVENT_ATTR(gmem_wr_total_bytes,		0x1a0),
+	CORESIGHT_EVENT_ATTR(remote_socket_wr_total_bytes,	0x1a1),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_data,		0x1a2),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_outstanding,	0x1a3),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_access,		0x1a4),
+
+	CORESIGHT_EVENT_ATTR(cmem_rd_data,			0x1a5),
+	CORESIGHT_EVENT_ATTR(cmem_rd_access,			0x1a6),
+	CORESIGHT_EVENT_ATTR(cmem_rd_outstanding,		0x1a7),
+	CORESIGHT_EVENT_ATTR(cmem_dl_rsp,			0x1a8),
+	CORESIGHT_EVENT_ATTR(cmem_dl_access,			0x1a9),
+	CORESIGHT_EVENT_ATTR(cmem_dl_outstanding,		0x1aa),
+	CORESIGHT_EVENT_ATTR(cmem_wb_data,			0x1ab),
+	CORESIGHT_EVENT_ATTR(cmem_wb_access,			0x1ac),
+	CORESIGHT_EVENT_ATTR(cmem_wb_outstanding,		0x1ad),
+	CORESIGHT_EVENT_ATTR(cmem_ev_rsp,			0x1ae),
+	CORESIGHT_EVENT_ATTR(cmem_ev_access,			0x1af),
+	CORESIGHT_EVENT_ATTR(cmem_ev_outstanding,		0x1b0),
+	CORESIGHT_EVENT_ATTR(cmem_wr_data,			0x1b1),
+	CORESIGHT_EVENT_ATTR(cmem_wr_outstanding,		0x1b2),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_data,		0x1b3),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_access,		0x1b7),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_access,		0x1bb),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_outstanding,	0x1bf),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_outstanding,	0x1c3),
+
+	CORESIGHT_EVENT_ATTR(ocu_prb_access,			0x1c7),
+	CORESIGHT_EVENT_ATTR(ocu_prb_data,			0x1c8),
+	CORESIGHT_EVENT_ATTR(ocu_prb_outstanding,		0x1c9),
+
+	CORESIGHT_EVENT_ATTR(cmem_wr_access,			0x1ca),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_access,		0x1cb),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_data,		0x1cf),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_data,		0x1d3),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_outstanding,	0x1d7),
+
+	CORESIGHT_EVENT_ATTR(cmem_wr_total_bytes,		0x1db),
+
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute *mcf_pmu_event_attrs[] = {
+	CORESIGHT_EVENT_ATTR(rd_bytes_loc,			0x0),
+	CORESIGHT_EVENT_ATTR(rd_bytes_rem,			0x1),
+	CORESIGHT_EVENT_ATTR(wr_bytes_loc,			0x2),
+	CORESIGHT_EVENT_ATTR(wr_bytes_rem,			0x3),
+	CORESIGHT_EVENT_ATTR(total_bytes_loc,			0x4),
+	CORESIGHT_EVENT_ATTR(total_bytes_rem,			0x5),
+	CORESIGHT_EVENT_ATTR(rd_req_loc,			0x6),
+	CORESIGHT_EVENT_ATTR(rd_req_rem,			0x7),
+	CORESIGHT_EVENT_ATTR(wr_req_loc,			0x8),
+	CORESIGHT_EVENT_ATTR(wr_req_rem,			0x9),
+	CORESIGHT_EVENT_ATTR(total_req_loc,			0xa),
+	CORESIGHT_EVENT_ATTR(total_req_rem,			0xb),
+	CORESIGHT_EVENT_ATTR(rd_cum_outs_loc,			0xc),
+	CORESIGHT_EVENT_ATTR(rd_cum_outs_rem,			0xd),
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute *scf_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	NULL,
+};
+
+static struct attribute *mcf_pcie_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	CORESIGHT_FORMAT_ATTR(root_port, "config1:0-9"),
+	NULL,
+};
+
+static struct attribute *mcf_gpu_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	CORESIGHT_FORMAT_ATTR(gpu, "config1:0-1"),
+	NULL,
+};
+
+static struct attribute *mcf_nvlink_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	CORESIGHT_FORMAT_ATTR(socket, "config1:0-3"),
+	NULL,
+};
+
+static struct attribute **
+nv_coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->event_attr;
+}
+
+static struct attribute **
+nv_coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->format_attr;
+}
+
+static const char *
+nv_coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->identifier;
+}
+
+static u32 nv_coresight_pmu_event_filter(const struct perf_event *event)
+{
+	const struct nv_pmu_impl *impl =
+		to_nv_pmu_impl(to_coresight_pmu(event->pmu));
+	return event->attr.config1 & impl->filter_mask;
+}
+
+int nv_coresight_init_ops(struct coresight_pmu *coresight_pmu)
+{
+	u32 product_id;
+	struct nv_pmu_impl *impl;
+
+	impl = devm_kzalloc(coresight_pmu->dev, sizeof(struct nv_pmu_impl),
+			   GFP_KERNEL);
+	if (!impl)
+		return -ENOMEM;
+
+	product_id = (coresight_pmu->impl.pmiidr >> PMIIDR_PRODUCTID_SHIFT) &
+		     PMIIDR_PRODUCTID_MASK;
+
+	switch (product_id) {
+	case 0x103:
+		impl->identifier	= "nvidia_mcf_pcie";
+		impl->filter_mask	= NV_MCF_PCIE_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_pcie_pmu_format_attrs;
+		break;
+	case 0x104:
+		impl->identifier	= "nvidia_mcf_gpuvir";
+		impl->filter_mask	= NV_MCF_GPU_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_gpu_pmu_format_attrs;
+		break;
+	case 0x105:
+		impl->identifier	= "nvidia_mcf_gpu";
+		impl->filter_mask	= NV_MCF_GPU_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_gpu_pmu_format_attrs;
+		break;
+	case 0x106:
+		impl->identifier	= "nvidia_mcf_nvlink";
+		impl->filter_mask	= NV_MCF_NVLINK_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_nvlink_pmu_format_attrs;
+		break;
+	case 0x2CF:
+		impl->identifier	= "nvidia_scf";
+		impl->filter_mask	= 0x0;
+		impl->event_attr	= scf_pmu_event_attrs;
+		impl->format_attr	= scf_pmu_format_attrs;
+		break;
+	default:
+		impl->identifier  = coresight_pmu_get_identifier(coresight_pmu);
+		impl->filter_mask = CORESIGHT_FILTER_MASK;
+		impl->event_attr  = coresight_pmu_get_event_attrs(coresight_pmu);
+		impl->format_attr =
+			coresight_pmu_get_format_attrs(coresight_pmu);
+		break;
+	}
+
+	impl->ops.get_event_attrs	= nv_coresight_pmu_get_event_attrs;
+	impl->ops.get_format_attrs	= nv_coresight_pmu_get_format_attrs;
+	impl->ops.get_identifier	= nv_coresight_pmu_get_identifier;
+	impl->ops.event_filter		= nv_coresight_pmu_event_filter;
+	impl->ops.event_type		= coresight_pmu_event_type;
+	impl->ops.event_attr_is_visible	= coresight_pmu_event_attr_is_visible;
+	impl->ops.is_cc_event		= coresight_pmu_is_cc_event;
+
+	coresight_pmu->impl.ops = &impl->ops;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nv_coresight_init_ops);
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h
new file mode 100644
index 000000000000..3c81c16c14f4
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+/* Support for NVIDIA specific attributes. */
+
+#ifndef __ARM_CORESIGHT_PMU_NVIDIA_H__
+#define __ARM_CORESIGHT_PMU_NVIDIA_H__
+
+#include "arm_coresight_pmu.h"
+
+/* Allocate NVIDIA descriptor. */
+int nv_coresight_init_ops(struct coresight_pmu *coresight_pmu);
+
+#endif /* __ARM_CORESIGHT_PMU_NVIDIA_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-15 16:30   ` [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
@ 2022-05-18  7:16     ` kernel test robot
  2022-05-18 20:10       ` Besar Wicaksono
  2022-05-19  8:52     ` Suzuki K Poulose
  1 sibling, 1 reply; 31+ messages in thread
From: kernel test robot @ 2022-05-18  7:16 UTC (permalink / raw)
  To: Besar Wicaksono, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: llvm, kbuild-all, linux-arm-kernel, linux-kernel, linux-tegra,
	sudeep.holla, thanu.rangarajan, Michael.Williams, suzuki.poulose,
	treding, jonathanh, vsethi, Besar Wicaksono

Hi Besar,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on soc/for-next linus/master v5.18-rc7 next-20220517]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/intel-lab-lkp/linux/commits/Besar-Wicaksono/perf-coresight_pmu-Add-support-for-ARM-CoreSight-PMU-driver/20220516-013131
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-next/core
config: arm64-allyesconfig (https://download.01.org/0day-ci/archive/20220518/202205181534.wuyBFt9d-lkp@intel.com/config)
compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project 853fa8ee225edf2d0de94b0dcbd31bea916e825e)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install arm64 cross compiling tool for clang build
        # apt-get install binutils-aarch64-linux-gnu
        # https://github.com/intel-lab-lkp/linux/commit/79f30980a7a91e6bbe7430206e4e46fa8134cfa9
        git remote add linux-review https://github.com/intel-lab-lkp/linux
        git fetch --no-tags linux-review Besar-Wicaksono/perf-coresight_pmu-Add-support-for-ARM-CoreSight-PMU-driver/20220516-013131
        git checkout 79f30980a7a91e6bbe7430206e4e46fa8134cfa9
        # save the config file
        mkdir build_dir && cp config build_dir/.config
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash drivers/media/platform/qcom/venus/ drivers/perf/coresight_pmu/ drivers/rtc/

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All warnings (new ones prefixed by >>):

   drivers/perf/coresight_pmu/arm_coresight_pmu.c:165:49: error: incomplete definition of type 'struct acpi_apmt_node'
           return CHECK_APMT_FLAG(coresight_pmu->apmt_node->flags, ATOMIC, SUPP);
                                  ~~~~~~~~~~~~~~~~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:4: note: expanded from macro 'CHECK_APMT_FLAG'
           ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
             ^~~~~
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:165:9: error: use of undeclared identifier 'ACPI_APMT_FLAGS_ATOMIC'
           return CHECK_APMT_FLAG(coresight_pmu->apmt_node->flags, ATOMIC, SUPP);
                  ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:13: note: expanded from macro 'CHECK_APMT_FLAG'
           ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
                      ^
   <scratch space>:61:1: note: expanded from here
   ACPI_APMT_FLAGS_ATOMIC
   ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:165:9: error: use of undeclared identifier 'ACPI_APMT_FLAGS_ATOMIC_SUPP'
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:41: note: expanded from macro 'CHECK_APMT_FLAG'
           ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
                                                  ^
   <scratch space>:64:1: note: expanded from here
   ACPI_APMT_FLAGS_ATOMIC_SUPP
   ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:369:13: error: incomplete definition of type 'struct acpi_apmt_node'
                   (apmt_node->impl_id) ? apmt_node->impl_id :
                    ~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:369:35: error: incomplete definition of type 'struct acpi_apmt_node'
                   (apmt_node->impl_id) ? apmt_node->impl_id :
                                          ~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:894:58: error: incomplete definition of type 'struct acpi_apmt_node'
                   devm_kasprintf(dev, GFP_KERNEL, PMUNAME "%u", apmt_node->id);
                                                                 ~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:920:31: error: incomplete definition of type 'struct acpi_apmt_node'
           if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
                               ~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:4: note: expanded from macro 'CHECK_APMT_FLAG'
           ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
             ^~~~~
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:920:6: error: use of undeclared identifier 'ACPI_APMT_FLAGS_DUAL_PAGE'
           if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
               ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:13: note: expanded from macro 'CHECK_APMT_FLAG'
           ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
                      ^
   <scratch space>:60:1: note: expanded from here
   ACPI_APMT_FLAGS_DUAL_PAGE
   ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:920:6: error: use of undeclared identifier 'ACPI_APMT_FLAGS_DUAL_PAGE_SUPP'
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:41: note: expanded from macro 'CHECK_APMT_FLAG'
           ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
                                                  ^
   <scratch space>:63:1: note: expanded from here
   ACPI_APMT_FLAGS_DUAL_PAGE_SUPP
   ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:1028:15: error: incomplete definition of type 'struct acpi_apmt_node'
           if (apmt_node->ovflw_irq == 0)
               ~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
>> drivers/perf/coresight_pmu/arm_coresight_pmu.c:1053:6: warning: variable 'level' set but not used [-Wunused-but-set-variable]
           int level = 0;
               ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:1079:27: error: incomplete definition of type 'struct acpi_apmt_node'
           affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
                           ~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:1079:37: error: use of undeclared identifier 'ACPI_APMT_FLAGS_AFFINITY'
           affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
                                              ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:1081:23: error: use of undeclared identifier 'ACPI_APMT_FLAGS_AFFINITY_PROC'
           if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
                                ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:1083:17: error: incomplete definition of type 'struct acpi_apmt_node'
                           if (apmt_node->proc_affinity ==
                               ~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
   drivers/perf/coresight_pmu/arm_coresight_pmu.c:1093:23: error: incomplete definition of type 'struct acpi_apmt_node'
                                       cpu, apmt_node->proc_affinity))
                                            ~~~~~~~~~^
   drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward declaration of 'struct acpi_apmt_node'
           struct acpi_apmt_node *apmt_node;
                  ^
   1 warning and 15 errors generated.


vim +/level +1053 drivers/perf/coresight_pmu/arm_coresight_pmu.c

  1047	
  1048	static inline int coresight_pmu_find_cpu_container(int cpu, u32 container_uid)
  1049	{
  1050		u32 acpi_uid;
  1051		struct device *cpu_dev = get_cpu_device(cpu);
  1052		struct acpi_device *acpi_dev = ACPI_COMPANION(cpu_dev);
> 1053		int level = 0;
  1054	
  1055		if (!cpu_dev)
  1056			return -ENODEV;
  1057	
  1058		while (acpi_dev) {
  1059			if (!strcmp(acpi_device_hid(acpi_dev),
  1060				    ACPI_PROCESSOR_CONTAINER_HID) &&
  1061			    !kstrtouint(acpi_device_uid(acpi_dev), 0, &acpi_uid) &&
  1062			    acpi_uid == container_uid)
  1063				return 0;
  1064	
  1065			acpi_dev = acpi_dev->parent;
  1066			level++;
  1067		}
  1068	
  1069		return -ENODEV;
  1070	}
  1071	

-- 
0-DAY CI Kernel Test Service
https://01.org/lkp

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-18  7:16     ` kernel test robot
@ 2022-05-18 20:10       ` Besar Wicaksono
  0 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-18 20:10 UTC (permalink / raw)
  To: sudeep.holla, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, thanu.rangarajan,
	Michael.Williams, suzuki.poulose, Thierry Reding,
	Jonathan Hunter, Vikram Sethi

The errors on the APMT* identifiers are due to the missing ACPI patch, which was
submitted on different series: https://lkml.org/lkml/fancy/2022/4/19/1395.
Sudeep, could you please suggest if I need to combine ACPI and driver patches
into a single patch series ?

I will fix the warning on 'level' usage on the next version.

Regards,
Besar

> -----Original Message-----
> From: kernel test robot <lkp@intel.com>
> Sent: Wednesday, May 18, 2022 2:16 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>; robin.murphy@arm.com;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com
> Cc: llvm@lists.linux.dev; kbuild-all@lists.01.org; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com;
> suzuki.poulose@arm.com; Thierry Reding <treding@nvidia.com>; Jonathan
> Hunter <jonathanh@nvidia.com>; Vikram Sethi <vsethi@nvidia.com>; Besar
> Wicaksono <bwicaksono@nvidia.com>
> Subject: Re: [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM
> CoreSight PMU driver
> 
> External email: Use caution opening links or attachments
> 
> 
> Hi Besar,
> 
> Thank you for the patch! Perhaps something to improve:
> 
> [auto build test WARNING on arm64/for-next/core]
> [also build test WARNING on soc/for-next linus/master v5.18-rc7 next-
> 20220517]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Besar-Wicaksono/perf-
> coresight_pmu-Add-support-for-ARM-CoreSight-PMU-driver/20220516-
> 013131
> base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git for-
> next/core
> config: arm64-allyesconfig (https://download.01.org/0day-
> ci/archive/20220518/202205181534.wuyBFt9d-lkp@intel.com/config)
> compiler: clang version 15.0.0 (https://github.com/llvm/llvm-project
> 853fa8ee225edf2d0de94b0dcbd31bea916e825e)
> reproduce (this is a W=1 build):
>         wget https://raw.githubusercontent.com/intel/lkp-
> tests/master/sbin/make.cross -O ~/bin/make.cross
>         chmod +x ~/bin/make.cross
>         # install arm64 cross compiling tool for clang build
>         # apt-get install binutils-aarch64-linux-gnu
>         # https://github.com/intel-lab-
> lkp/linux/commit/79f30980a7a91e6bbe7430206e4e46fa8134cfa9
>         git remote add linux-review https://github.com/intel-lab-lkp/linux
>         git fetch --no-tags linux-review Besar-Wicaksono/perf-coresight_pmu-
> Add-support-for-ARM-CoreSight-PMU-driver/20220516-013131
>         git checkout 79f30980a7a91e6bbe7430206e4e46fa8134cfa9
>         # save the config file
>         mkdir build_dir && cp config build_dir/.config
>         COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross
> W=1 O=build_dir ARCH=arm64 SHELL=/bin/bash
> drivers/media/platform/qcom/venus/ drivers/perf/coresight_pmu/
> drivers/rtc/
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot <lkp@intel.com>
> 
> All warnings (new ones prefixed by >>):
> 
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:165:49: error:
> incomplete definition of type 'struct acpi_apmt_node'
>            return CHECK_APMT_FLAG(coresight_pmu->apmt_node->flags,
> ATOMIC, SUPP);
>                                   ~~~~~~~~~~~~~~~~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:4: note: expanded
> from macro 'CHECK_APMT_FLAG'
>            ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ##
> _ ## v))
>              ^~~~~
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:165:9: error: use of
> undeclared identifier 'ACPI_APMT_FLAGS_ATOMIC'
>            return CHECK_APMT_FLAG(coresight_pmu->apmt_node->flags,
> ATOMIC, SUPP);
>                   ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:13: note: expanded
> from macro 'CHECK_APMT_FLAG'
>            ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ##
> _ ## v))
>                       ^
>    <scratch space>:61:1: note: expanded from here
>    ACPI_APMT_FLAGS_ATOMIC
>    ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:165:9: error: use of
> undeclared identifier 'ACPI_APMT_FLAGS_ATOMIC_SUPP'
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:41: note: expanded
> from macro 'CHECK_APMT_FLAG'
>            ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ##
> _ ## v))
>                                                   ^
>    <scratch space>:64:1: note: expanded from here
>    ACPI_APMT_FLAGS_ATOMIC_SUPP
>    ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:369:13: error:
> incomplete definition of type 'struct acpi_apmt_node'
>                    (apmt_node->impl_id) ? apmt_node->impl_id :
>                     ~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:369:35: error:
> incomplete definition of type 'struct acpi_apmt_node'
>                    (apmt_node->impl_id) ? apmt_node->impl_id :
>                                           ~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:894:58: error:
> incomplete definition of type 'struct acpi_apmt_node'
>                    devm_kasprintf(dev, GFP_KERNEL, PMUNAME "%u", apmt_node-
> >id);
>                                                                  ~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:920:31: error:
> incomplete definition of type 'struct acpi_apmt_node'
>            if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
>                                ~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:4: note: expanded
> from macro 'CHECK_APMT_FLAG'
>            ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ##
> _ ## v))
>              ^~~~~
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:920:6: error: use of
> undeclared identifier 'ACPI_APMT_FLAGS_DUAL_PAGE'
>            if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
>                ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:13: note: expanded
> from macro 'CHECK_APMT_FLAG'
>            ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ##
> _ ## v))
>                       ^
>    <scratch space>:60:1: note: expanded from here
>    ACPI_APMT_FLAGS_DUAL_PAGE
>    ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:920:6: error: use of
> undeclared identifier 'ACPI_APMT_FLAGS_DUAL_PAGE_SUPP'
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:129:41: note: expanded
> from macro 'CHECK_APMT_FLAG'
>            ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ##
> _ ## v))
>                                                   ^
>    <scratch space>:63:1: note: expanded from here
>    ACPI_APMT_FLAGS_DUAL_PAGE_SUPP
>    ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:1028:15: error:
> incomplete definition of type 'struct acpi_apmt_node'
>            if (apmt_node->ovflw_irq == 0)
>                ~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
> >> drivers/perf/coresight_pmu/arm_coresight_pmu.c:1053:6: warning:
> variable 'level' set but not used [-Wunused-but-set-variable]
>            int level = 0;
>                ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:1079:27: error:
> incomplete definition of type 'struct acpi_apmt_node'
>            affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
>                            ~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:1079:37: error: use of
> undeclared identifier 'ACPI_APMT_FLAGS_AFFINITY'
>            affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
>                                               ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:1081:23: error: use of
> undeclared identifier 'ACPI_APMT_FLAGS_AFFINITY_PROC'
>            if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
>                                 ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:1083:17: error:
> incomplete definition of type 'struct acpi_apmt_node'
>                            if (apmt_node->proc_affinity ==
>                                ~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.c:1093:23: error:
> incomplete definition of type 'struct acpi_apmt_node'
>                                        cpu, apmt_node->proc_affinity))
>                                             ~~~~~~~~~^
>    drivers/perf/coresight_pmu/arm_coresight_pmu.h:116:9: note: forward
> declaration of 'struct acpi_apmt_node'
>            struct acpi_apmt_node *apmt_node;
>                   ^
>    1 warning and 15 errors generated.
> 
> 
> vim +/level +1053 drivers/perf/coresight_pmu/arm_coresight_pmu.c
> 
>   1047
>   1048  static inline int coresight_pmu_find_cpu_container(int cpu, u32
> container_uid)
>   1049  {
>   1050          u32 acpi_uid;
>   1051          struct device *cpu_dev = get_cpu_device(cpu);
>   1052          struct acpi_device *acpi_dev = ACPI_COMPANION(cpu_dev);
> > 1053          int level = 0;
>   1054
>   1055          if (!cpu_dev)
>   1056                  return -ENODEV;
>   1057
>   1058          while (acpi_dev) {
>   1059                  if (!strcmp(acpi_device_hid(acpi_dev),
>   1060                              ACPI_PROCESSOR_CONTAINER_HID) &&
>   1061                      !kstrtouint(acpi_device_uid(acpi_dev), 0, &acpi_uid) &&
>   1062                      acpi_uid == container_uid)
>   1063                          return 0;
>   1064
>   1065                  acpi_dev = acpi_dev->parent;
>   1066                  level++;
>   1067          }
>   1068
>   1069          return -ENODEV;
>   1070  }
>   1071
> 
> --
> 0-DAY CI Kernel Test Service
> https://01.org/lkp

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-15 16:30   ` [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
  2022-05-18  7:16     ` kernel test robot
@ 2022-05-19  8:52     ` Suzuki K Poulose
  2022-05-19 17:04       ` Besar Wicaksono
  1 sibling, 1 reply; 31+ messages in thread
From: Suzuki K Poulose @ 2022-05-19  8:52 UTC (permalink / raw)
  To: Besar Wicaksono, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi

On 15/05/2022 17:30, Besar Wicaksono wrote:
> Add support for ARM CoreSight PMU driver framework and interfaces.
> The driver provides generic implementation to operate uncore PMU based
> on ARM CoreSight PMU architecture. The driver also provides interface
> to get vendor/implementation specific information, for example event
> attributes and formating.
> 
> The specification used in this implementation can be found below:
>   * ACPI Arm Performance Monitoring Unit table:
>          https://developer.arm.com/documentation/den0117/latest
>   * ARM Coresight PMU architecture:
>          https://developer.arm.com/documentation/ihi0091/latest
> 
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
>   arch/arm64/configs/defconfig                  |    1 +
>   drivers/perf/Kconfig                          |    2 +
>   drivers/perf/Makefile                         |    1 +
>   drivers/perf/coresight_pmu/Kconfig            |   11 +
>   drivers/perf/coresight_pmu/Makefile           |    6 +
>   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1267 +++++++++++++++++
>   .../perf/coresight_pmu/arm_coresight_pmu.h    |  171 +++
>   7 files changed, 1459 insertions(+)
>   create mode 100644 drivers/perf/coresight_pmu/Kconfig
>   create mode 100644 drivers/perf/coresight_pmu/Makefile
>   create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
>   create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 2ca8b1b336d2..8f2120182b25 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -1196,6 +1196,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
>   CONFIG_PHY_TEGRA_XUSB=y
>   CONFIG_PHY_AM654_SERDES=m
>   CONFIG_PHY_J721E_WIZ=m
> +CONFIG_ARM_CORESIGHT_PMU=y
>   CONFIG_ARM_SMMU_V3_PMU=m
>   CONFIG_FSL_IMX8_DDR_PMU=m
>   CONFIG_QCOM_L2_PMU=y
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 1e2d69453771..c4e7cd5b4162 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
>   	  Enable perf support for Marvell DDR Performance monitoring
>   	  event on CN10K platform.
>   
> +source "drivers/perf/coresight_pmu/Kconfig"
> +
>   endmenu
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index 57a279c61df5..4126a04b5583 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
>   obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
>   obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
>   obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
> +obj-$(CONFIG_ARM_CORESIGHT_PMU) += coresight_pmu/
> diff --git a/drivers/perf/coresight_pmu/Kconfig b/drivers/perf/coresight_pmu/Kconfig
> new file mode 100644
> index 000000000000..89174f54c7be
> --- /dev/null
> +++ b/drivers/perf/coresight_pmu/Kconfig
> @@ -0,0 +1,11 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> +
> +config ARM_CORESIGHT_PMU
> +	tristate "ARM Coresight PMU"
> +	depends on ACPI
> +	depends on ACPI_APMT || COMPILE_TEST
> +	help
> +	  Provides support for Performance Monitoring Unit (PMU) events based on
> +	  ARM CoreSight PMU architecture.
> \ No newline at end of file
> diff --git a/drivers/perf/coresight_pmu/Makefile b/drivers/perf/coresight_pmu/Makefile
> new file mode 100644
> index 000000000000..a2a7a5fbbc16
> --- /dev/null
> +++ b/drivers/perf/coresight_pmu/Makefile
> @@ -0,0 +1,6 @@
> +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> +#
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
> +	arm_coresight_pmu.o
> diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> new file mode 100644
> index 000000000000..36ac77ab85cd
> --- /dev/null
> +++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> @@ -0,0 +1,1267 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ARM CoreSight PMU driver.
> + *
> + * This driver adds support for uncore PMU based on ARM CoreSight Performance
> + * Monitoring Unit Architecture. The PMU is accessible via MMIO registers and
> + * like other uncore PMUs, it does not support process specific events and
> + * cannot be used in sampling mode.
> + *
> + * This code is based on other uncore PMUs like ARM DSU PMU. It provides a
> + * generic implementation to operate the PMU according to CoreSight PMU
> + * architecture and ACPI ARM PMU table (APMT) documents below:
> + *   - ARM CoreSight PMU architecture document number: ARM IHI 0091 A.a-00bet0.
> + *   - APMT document number: ARM DEN0117.
> + * The description of the PMU, like the PMU device identification, available
> + * events, and configuration options, is vendor specific. The driver provides
> + * interface for vendor specific code to get this information. This allows the
> + * driver to be shared with PMU from different vendors.
> + *
> + * CoreSight PMU devices are named as arm_coresight_pmu<node_id> where <node_id>
> + * is APMT node id. The description of the device, like the identifier,

Please see my comment below, near coresight_pmu_alloc().

> + * supported events, and formats can be found in sysfs
> + * /sys/bus/event_source/devices/arm_coresight_pmu<node_id>
> + *
> + * The user should refer to the vendor technical documentation to get details
> + * about the supported events.
> + *
> + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> + *
> + */

...

> +static int coresight_pmu_alloc(struct platform_device *pdev,
> +			       struct coresight_pmu **coresight_pmu)
> +{
> +	struct acpi_apmt_node *apmt_node;
> +	struct device *dev;
> +	struct coresight_pmu *pmu;
> +
> +	dev = &pdev->dev;
> +	apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
> +	if (!apmt_node) {
> +		dev_err(dev, "failed to get APMT node\n");
> +		return -ENOMEM;
> +	}
> +
> +	pmu = devm_kzalloc(dev, sizeof(*pmu), GFP_KERNEL);
> +	if (!pmu)
> +		return -ENOMEM;
> +
> +	*coresight_pmu = pmu;
> +
> +	pmu->dev = dev;
> +	pmu->apmt_node = apmt_node;
> +	pmu->name =
> +		devm_kasprintf(dev, GFP_KERNEL, PMUNAME "%u", apmt_node->id);

Could we not name this "<vendor>_<ipname>_pmu" ? Or even let the 
"implementor" name it ? After all, for a *normal user*, all it matters
is how to find a PMU device for my xyz IP.
The coresight_pmu architecture is there to make the life easier for the
software driver and the hardware implementation. We don't need to 
necessarily pass on this abstraction to an end-user and learn how to 
figure out which "arm_*_pmu<N>" is my PCIe, or SMMU or xyz instance.

Suzuki

^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-19  8:52     ` Suzuki K Poulose
@ 2022-05-19 17:04       ` Besar Wicaksono
  0 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-19 17:04 UTC (permalink / raw)
  To: Suzuki K Poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi



> -----Original Message-----
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> Sent: Thursday, May 19, 2022 3:52 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>; robin.murphy@arm.com;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>
> Subject: Re: [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM
> CoreSight PMU driver
> 
> External email: Use caution opening links or attachments
> 
> 
> On 15/05/2022 17:30, Besar Wicaksono wrote:
> > Add support for ARM CoreSight PMU driver framework and interfaces.
> > The driver provides generic implementation to operate uncore PMU based
> > on ARM CoreSight PMU architecture. The driver also provides interface
> > to get vendor/implementation specific information, for example event
> > attributes and formating.
> >
> > The specification used in this implementation can be found below:
> >   * ACPI Arm Performance Monitoring Unit table:
> >          https://developer.arm.com/documentation/den0117/latest
> >   * ARM Coresight PMU architecture:
> >          https://developer.arm.com/documentation/ihi0091/latest
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> > ---
> >   arch/arm64/configs/defconfig                  |    1 +
> >   drivers/perf/Kconfig                          |    2 +
> >   drivers/perf/Makefile                         |    1 +
> >   drivers/perf/coresight_pmu/Kconfig            |   11 +
> >   drivers/perf/coresight_pmu/Makefile           |    6 +
> >   .../perf/coresight_pmu/arm_coresight_pmu.c    | 1267
> +++++++++++++++++
> >   .../perf/coresight_pmu/arm_coresight_pmu.h    |  171 +++
> >   7 files changed, 1459 insertions(+)
> >   create mode 100644 drivers/perf/coresight_pmu/Kconfig
> >   create mode 100644 drivers/perf/coresight_pmu/Makefile
> >   create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
> >   create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
> >
> > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > index 2ca8b1b336d2..8f2120182b25 100644
> > --- a/arch/arm64/configs/defconfig
> > +++ b/arch/arm64/configs/defconfig
> > @@ -1196,6 +1196,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
> >   CONFIG_PHY_TEGRA_XUSB=y
> >   CONFIG_PHY_AM654_SERDES=m
> >   CONFIG_PHY_J721E_WIZ=m
> > +CONFIG_ARM_CORESIGHT_PMU=y
> >   CONFIG_ARM_SMMU_V3_PMU=m
> >   CONFIG_FSL_IMX8_DDR_PMU=m
> >   CONFIG_QCOM_L2_PMU=y
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 1e2d69453771..c4e7cd5b4162 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
> >         Enable perf support for Marvell DDR Performance monitoring
> >         event on CN10K platform.
> >
> > +source "drivers/perf/coresight_pmu/Kconfig"
> > +
> >   endmenu
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index 57a279c61df5..4126a04b5583 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) +=
> arm_dmc620_pmu.o
> >   obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) +=
> marvell_cn10k_tad_pmu.o
> >   obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) +=
> marvell_cn10k_ddr_pmu.o
> >   obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
> > +obj-$(CONFIG_ARM_CORESIGHT_PMU) += coresight_pmu/
> > diff --git a/drivers/perf/coresight_pmu/Kconfig
> b/drivers/perf/coresight_pmu/Kconfig
> > new file mode 100644
> > index 000000000000..89174f54c7be
> > --- /dev/null
> > +++ b/drivers/perf/coresight_pmu/Kconfig
> > @@ -0,0 +1,11 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > +
> > +config ARM_CORESIGHT_PMU
> > +     tristate "ARM Coresight PMU"
> > +     depends on ACPI
> > +     depends on ACPI_APMT || COMPILE_TEST
> > +     help
> > +       Provides support for Performance Monitoring Unit (PMU) events
> based on
> > +       ARM CoreSight PMU architecture.
> > \ No newline at end of file
> > diff --git a/drivers/perf/coresight_pmu/Makefile
> b/drivers/perf/coresight_pmu/Makefile
> > new file mode 100644
> > index 000000000000..a2a7a5fbbc16
> > --- /dev/null
> > +++ b/drivers/perf/coresight_pmu/Makefile
> > @@ -0,0 +1,6 @@
> > +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > +#
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
> > +     arm_coresight_pmu.o
> > diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> > new file mode 100644
> > index 000000000000..36ac77ab85cd
> > --- /dev/null
> > +++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
> > @@ -0,0 +1,1267 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * ARM CoreSight PMU driver.
> > + *
> > + * This driver adds support for uncore PMU based on ARM CoreSight
> Performance
> > + * Monitoring Unit Architecture. The PMU is accessible via MMIO registers
> and
> > + * like other uncore PMUs, it does not support process specific events and
> > + * cannot be used in sampling mode.
> > + *
> > + * This code is based on other uncore PMUs like ARM DSU PMU. It
> provides a
> > + * generic implementation to operate the PMU according to CoreSight
> PMU
> > + * architecture and ACPI ARM PMU table (APMT) documents below:
> > + *   - ARM CoreSight PMU architecture document number: ARM IHI 0091
> A.a-00bet0.
> > + *   - APMT document number: ARM DEN0117.
> > + * The description of the PMU, like the PMU device identification,
> available
> > + * events, and configuration options, is vendor specific. The driver
> provides
> > + * interface for vendor specific code to get this information. This allows
> the
> > + * driver to be shared with PMU from different vendors.
> > + *
> > + * CoreSight PMU devices are named as arm_coresight_pmu<node_id>
> where <node_id>
> > + * is APMT node id. The description of the device, like the identifier,
> 
> Please see my comment below, near coresight_pmu_alloc().
> 
> > + * supported events, and formats can be found in sysfs
> > + * /sys/bus/event_source/devices/arm_coresight_pmu<node_id>
> > + *
> > + * The user should refer to the vendor technical documentation to get
> details
> > + * about the supported events.
> > + *
> > + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > + *
> > + */
> 
> ...
> 
> > +static int coresight_pmu_alloc(struct platform_device *pdev,
> > +                            struct coresight_pmu **coresight_pmu)
> > +{
> > +     struct acpi_apmt_node *apmt_node;
> > +     struct device *dev;
> > +     struct coresight_pmu *pmu;
> > +
> > +     dev = &pdev->dev;
> > +     apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
> > +     if (!apmt_node) {
> > +             dev_err(dev, "failed to get APMT node\n");
> > +             return -ENOMEM;
> > +     }
> > +
> > +     pmu = devm_kzalloc(dev, sizeof(*pmu), GFP_KERNEL);
> > +     if (!pmu)
> > +             return -ENOMEM;
> > +
> > +     *coresight_pmu = pmu;
> > +
> > +     pmu->dev = dev;
> > +     pmu->apmt_node = apmt_node;
> > +     pmu->name =
> > +             devm_kasprintf(dev, GFP_KERNEL, PMUNAME "%u", apmt_node-
> >id);
> 
> Could we not name this "<vendor>_<ipname>_pmu" ? Or even let the
> "implementor" name it ? After all, for a *normal user*, all it matters
> is how to find a PMU device for my xyz IP.
> The coresight_pmu architecture is there to make the life easier for the
> software driver and the hardware implementation. We don't need to
> necessarily pass on this abstraction to an end-user and learn how to
> figure out which "arm_*_pmu<N>" is my PCIe, or SMMU or xyz instance.
> 
> Suzuki

Sure, I can add another implementor ops to generate a custom name.
I will also change the default naming to "arm_<APMT node type>_pmu"
format, if implementation specific name is not provided.

Regards,
Besar

^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 0/2] perf: ARM CoreSight PMU support
  2022-05-15 16:30 ` [PATCH v2 " Besar Wicaksono
  2022-05-15 16:30   ` [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
  2022-05-15 16:30   ` [PATCH v2 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
@ 2022-05-25  6:48   ` Besar Wicaksono
  2022-05-25  6:48     ` [PATCH v3 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
                       ` (3 more replies)
  2 siblings, 4 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-25  6:48 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	Besar Wicaksono

Add driver support for ARM CoreSight PMU device and event attributes for NVIDIA
implementation. The code is based on ARM Coresight PMU architecture and ACPI ARM
Performance Monitoring Unit table (APMT) specification below:
 * ARM Coresight PMU:
        https://developer.arm.com/documentation/ihi0091/latest
 * APMT: https://developer.arm.com/documentation/den0117/latest

Notes:
 * There is a concern on the naming of the PMU device.
   Currently the driver is probing "arm-coresight-pmu" device, however the APMT
   spec supports different kinds of CoreSight PMU based implementation. So it is
   open for discussion if the name can stay or a "generic" name is required.
   Please see the following thread:
   http://lists.infradead.org/pipermail/linux-arm-kernel/2022-May/740485.html

The patchset applies on top of
  https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  master next-20220524

Changes from v2:
 * Driver is now probing "arm-system-pmu" device.
 * Change default PMU naming to "arm_<APMT node type>_pmu".
 * Add implementor ops to generate custom name.
Thanks to suzuki.poulose@arm.com for the review comments.

Changes from v1:
 * Remove CPU arch dependency.
 * Remove 32-bit read/write helper function and just use read/writel.
 * Add .is_visible into event attribute to filter out cycle counter event.
 * Update pmiidr matching.
 * Remove read-modify-write on PMCR since the driver only writes to PMCR.E.
 * Assign default cycle event outside the 32-bit PMEVTYPER range.
 * Rework the active event and used counter tracking.
Thanks to robin.murphy@arm.com for the review comments.

Besar Wicaksono (2):
  perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute

 arch/arm64/configs/defconfig                  |    1 +
 drivers/perf/Kconfig                          |    2 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/coresight_pmu/Kconfig            |   11 +
 drivers/perf/coresight_pmu/Makefile           |    7 +
 .../perf/coresight_pmu/arm_coresight_pmu.c    | 1316 +++++++++++++++++
 .../perf/coresight_pmu/arm_coresight_pmu.h    |  177 +++
 .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  312 ++++
 .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
 9 files changed, 1844 insertions(+)
 create mode 100644 drivers/perf/coresight_pmu/Kconfig
 create mode 100644 drivers/perf/coresight_pmu/Makefile
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h


base-commit: 09ce5091ff971cdbfd67ad84dc561ea27f10d67a
-- 
2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v3 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver
  2022-05-25  6:48   ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
@ 2022-05-25  6:48     ` Besar Wicaksono
  2022-05-25  6:48     ` [PATCH v3 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-25  6:48 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	Besar Wicaksono

Add support for ARM CoreSight PMU driver framework and interfaces.
The driver provides generic implementation to operate uncore PMU based
on ARM CoreSight PMU architecture. The driver also provides interface
to get vendor/implementation specific information, for example event
attributes and formating.

The specification used in this implementation can be found below:
 * ACPI Arm Performance Monitoring Unit table:
        https://developer.arm.com/documentation/den0117/latest
 * ARM Coresight PMU architecture:
        https://developer.arm.com/documentation/ihi0091/latest

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 arch/arm64/configs/defconfig                  |    1 +
 drivers/perf/Kconfig                          |    2 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/coresight_pmu/Kconfig            |   11 +
 drivers/perf/coresight_pmu/Makefile           |    6 +
 .../perf/coresight_pmu/arm_coresight_pmu.c    | 1312 +++++++++++++++++
 .../perf/coresight_pmu/arm_coresight_pmu.h    |  177 +++
 7 files changed, 1510 insertions(+)
 create mode 100644 drivers/perf/coresight_pmu/Kconfig
 create mode 100644 drivers/perf/coresight_pmu/Makefile
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 7d1105343bc2..22184f8883da 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1212,6 +1212,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
 CONFIG_PHY_TEGRA_XUSB=y
 CONFIG_PHY_AM654_SERDES=m
 CONFIG_PHY_J721E_WIZ=m
+CONFIG_ARM_CORESIGHT_PMU=y
 CONFIG_ARM_SMMU_V3_PMU=m
 CONFIG_FSL_IMX8_DDR_PMU=m
 CONFIG_QCOM_L2_PMU=y
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 1e2d69453771..c4e7cd5b4162 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
 	  Enable perf support for Marvell DDR Performance monitoring
 	  event on CN10K platform.
 
+source "drivers/perf/coresight_pmu/Kconfig"
+
 endmenu
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 57a279c61df5..4126a04b5583 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
 obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
+obj-$(CONFIG_ARM_CORESIGHT_PMU) += coresight_pmu/
diff --git a/drivers/perf/coresight_pmu/Kconfig b/drivers/perf/coresight_pmu/Kconfig
new file mode 100644
index 000000000000..89174f54c7be
--- /dev/null
+++ b/drivers/perf/coresight_pmu/Kconfig
@@ -0,0 +1,11 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+
+config ARM_CORESIGHT_PMU
+	tristate "ARM Coresight PMU"
+	depends on ACPI
+	depends on ACPI_APMT || COMPILE_TEST
+	help
+	  Provides support for Performance Monitoring Unit (PMU) events based on
+	  ARM CoreSight PMU architecture.
\ No newline at end of file
diff --git a/drivers/perf/coresight_pmu/Makefile b/drivers/perf/coresight_pmu/Makefile
new file mode 100644
index 000000000000..a2a7a5fbbc16
--- /dev/null
+++ b/drivers/perf/coresight_pmu/Makefile
@@ -0,0 +1,6 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+#
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
+	arm_coresight_pmu.o
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
new file mode 100644
index 000000000000..ba52cc592b2d
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
@@ -0,0 +1,1312 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM CoreSight PMU driver.
+ *
+ * This driver adds support for uncore PMU based on ARM CoreSight Performance
+ * Monitoring Unit Architecture. The PMU is accessible via MMIO registers and
+ * like other uncore PMUs, it does not support process specific events and
+ * cannot be used in sampling mode.
+ *
+ * This code is based on other uncore PMUs like ARM DSU PMU. It provides a
+ * generic implementation to operate the PMU according to CoreSight PMU
+ * architecture and ACPI ARM PMU table (APMT) documents below:
+ *   - ARM CoreSight PMU architecture document number: ARM IHI 0091 A.a-00bet0.
+ *   - APMT document number: ARM DEN0117.
+ * The description of the PMU, like the PMU device identification, available
+ * events, and configuration options, is vendor specific. The driver provides
+ * interface for vendor specific code to get this information. This allows the
+ * driver to be shared with PMU from different vendors.
+ *
+ * The CoreSight PMU devices can be named using implementor specific format, or
+ * with default naming format: arm_<apmt node type>_pmu_<numeric id>.
+ * The description of the device, like the identifier, supported events, and
+ * formats can be found in sysfs
+ * /sys/bus/event_source/devices/arm_<apmt node type>_pmu_<numeric id>
+ *
+ * The user should refer to the vendor technical documentation to get details
+ * about the supported events.
+ *
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <linux/ctype.h>
+#include <linux/interrupt.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/module.h>
+#include <linux/mod_devicetable.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <acpi/processor.h>
+
+#include "arm_coresight_pmu.h"
+
+#define PMUNAME "arm_system_pmu"
+
+#define CORESIGHT_CPUMASK_ATTR(_name, _config)				\
+	CORESIGHT_EXT_ATTR(_name, coresight_pmu_cpumask_show,		\
+			   (unsigned long)_config)
+
+/*
+ * Register offsets based on CoreSight Performance Monitoring Unit Architecture
+ * Document number: ARM-ECM-0640169 00alp6
+ */
+#define PMEVCNTR_LO					0x0
+#define PMEVCNTR_HI					0x4
+#define PMEVTYPER					0x400
+#define PMCCFILTR					0x47C
+#define PMEVFILTR					0xA00
+#define PMCNTENSET					0xC00
+#define PMCNTENCLR					0xC20
+#define PMINTENSET					0xC40
+#define PMINTENCLR					0xC60
+#define PMOVSCLR					0xC80
+#define PMOVSSET					0xCC0
+#define PMCFGR						0xE00
+#define PMCR						0xE04
+#define PMIIDR						0xE08
+
+/* PMCFGR register field */
+#define PMCFGR_NCG_SHIFT				28
+#define PMCFGR_NCG_MASK					0xf
+#define PMCFGR_HDBG					BIT(24)
+#define PMCFGR_TRO					BIT(23)
+#define PMCFGR_SS					BIT(22)
+#define PMCFGR_FZO					BIT(21)
+#define PMCFGR_MSI					BIT(20)
+#define PMCFGR_UEN					BIT(19)
+#define PMCFGR_NA					BIT(17)
+#define PMCFGR_EX					BIT(16)
+#define PMCFGR_CCD					BIT(15)
+#define PMCFGR_CC					BIT(14)
+#define PMCFGR_SIZE_SHIFT				8
+#define PMCFGR_SIZE_MASK				0x3f
+#define PMCFGR_N_SHIFT					0
+#define PMCFGR_N_MASK					0xff
+
+/* PMCR register field */
+#define PMCR_TRO					BIT(11)
+#define PMCR_HDBG					BIT(10)
+#define PMCR_FZO					BIT(9)
+#define PMCR_NA						BIT(8)
+#define PMCR_DP						BIT(5)
+#define PMCR_X						BIT(4)
+#define PMCR_D						BIT(3)
+#define PMCR_C						BIT(2)
+#define PMCR_P						BIT(1)
+#define PMCR_E						BIT(0)
+
+/* PMIIDR register field */
+#define PMIIDR_IMPLEMENTER_MASK				0xFFF
+#define PMIIDR_PRODUCTID_MASK				0xFFF
+#define PMIIDR_PRODUCTID_SHIFT				20
+
+/* Each SET/CLR register supports up to 32 counters. */
+#define CORESIGHT_SET_CLR_REG_COUNTER_NUM		32
+#define CORESIGHT_SET_CLR_REG_COUNTER_SHIFT		5
+
+/* The number of 32-bit SET/CLR register that can be supported. */
+#define CORESIGHT_SET_CLR_REG_MAX_NUM ((PMCNTENCLR - PMCNTENSET) / sizeof(u32))
+
+static_assert((CORESIGHT_SET_CLR_REG_MAX_NUM *
+	       CORESIGHT_SET_CLR_REG_COUNTER_NUM) >=
+	      CORESIGHT_PMU_MAX_HW_CNTRS);
+
+/* Convert counter idx into SET/CLR register number. */
+#define CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx)				\
+	(idx >> CORESIGHT_SET_CLR_REG_COUNTER_SHIFT)
+
+/* Convert counter idx into SET/CLR register bit. */
+#define CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx)				\
+	(idx & (CORESIGHT_SET_CLR_REG_COUNTER_NUM - 1))
+
+#define CORESIGHT_ACTIVE_CPU_MASK			0x0
+#define CORESIGHT_ASSOCIATED_CPU_MASK			0x1
+
+
+/* Check if field f in flags is set with value v */
+#define CHECK_APMT_FLAG(flags, f, v) \
+	((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
+
+static unsigned long coresight_pmu_cpuhp_state;
+
+/*
+ * In CoreSight PMU architecture, all of the MMIO registers are 32-bit except
+ * counter register. The counter register can be implemented as 32-bit or 64-bit
+ * register depending on the value of PMCFGR.SIZE field. For 64-bit access,
+ * single-copy 64-bit atomic support is implementation defined. APMT node flag
+ * is used to identify if the PMU supports 64-bit single copy atomic. If 64-bit
+ * single copy atomic is not supported, the driver treats the register as a pair
+ * of 32-bit register.
+ */
+
+/*
+ * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
+ */
+static u64 read_reg64_hilohi(const void __iomem *addr)
+{
+	u32 val_lo, val_hi;
+	u64 val;
+
+	/* Use high-low-high sequence to avoid tearing */
+	do {
+		val_hi = readl(addr + 4);
+		val_lo = readl(addr);
+	} while (val_hi != readl(addr + 4));
+
+	val = (((u64)val_hi << 32) | val_lo);
+
+	return val;
+}
+
+/* Check if PMU supports 64-bit single copy atomic. */
+static inline bool support_atomic(const struct coresight_pmu *coresight_pmu)
+{
+	return CHECK_APMT_FLAG(coresight_pmu->apmt_node->flags, ATOMIC, SUPP);
+}
+
+/* Check if cycle counter is supported. */
+static inline bool support_cc(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr & PMCFGR_CC);
+}
+
+/* Get counter size. */
+static inline u32 pmcfgr_size(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr >> PMCFGR_SIZE_SHIFT) & PMCFGR_SIZE_MASK;
+}
+
+/* Check if counter is implemented as 64-bit register. */
+static inline bool
+use_64b_counter_reg(const struct coresight_pmu *coresight_pmu)
+{
+	return (pmcfgr_size(coresight_pmu) > 31);
+}
+
+/* Get number of counters, minus one. */
+static inline u32 pmcfgr_n(const struct coresight_pmu *coresight_pmu)
+{
+	return (coresight_pmu->pmcfgr >> PMCFGR_N_SHIFT) & PMCFGR_N_MASK;
+}
+
+ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
+				       struct device_attribute *attr, char *buf)
+{
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "event=0x%llx\n",
+			  (unsigned long long)eattr->var);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_event_show);
+
+/* Default event list. */
+static struct attribute *coresight_pmu_event_attrs[] = {
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+struct attribute **
+coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	return coresight_pmu_event_attrs;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_event_attrs);
+
+umode_t coresight_pmu_event_attr_is_visible(struct kobject *kobj,
+					    struct attribute *attr, int unused)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct coresight_pmu *coresight_pmu =
+		to_coresight_pmu(dev_get_drvdata(dev));
+	struct perf_pmu_events_attr *eattr;
+
+	eattr = container_of(attr, typeof(*eattr), attr.attr);
+
+	/* Hide cycle event if not supported */
+	if (!support_cc(coresight_pmu) &&
+	    eattr->id == CORESIGHT_PMU_EVT_CYCLES_DEFAULT) {
+		return 0;
+	}
+
+	return attr->mode;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_event_attr_is_visible);
+
+ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "%s\n", (char *)eattr->var);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_sysfs_format_show);
+
+static struct attribute *coresight_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	CORESIGHT_FORMAT_FILTER_ATTR,
+	NULL,
+};
+
+struct attribute **
+coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	return coresight_pmu_format_attrs;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_format_attrs);
+
+u32 coresight_pmu_event_type(const struct perf_event *event)
+{
+	return event->attr.config & CORESIGHT_EVENT_MASK;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_event_type);
+
+u32 coresight_pmu_event_filter(const struct perf_event *event)
+{
+	return event->attr.config1 & CORESIGHT_FILTER_MASK;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_event_filter);
+
+static ssize_t coresight_pmu_identifier_show(struct device *dev,
+					     struct device_attribute *attr,
+					     char *page)
+{
+	struct coresight_pmu *coresight_pmu =
+		to_coresight_pmu(dev_get_drvdata(dev));
+
+	return sysfs_emit(page, "%s\n", coresight_pmu->identifier);
+}
+
+static struct device_attribute coresight_pmu_identifier_attr =
+	__ATTR(identifier, 0444, coresight_pmu_identifier_show, NULL);
+
+static struct attribute *coresight_pmu_identifier_attrs[] = {
+	&coresight_pmu_identifier_attr.attr,
+	NULL,
+};
+
+static struct attribute_group coresight_pmu_identifier_attr_group = {
+	.attrs = coresight_pmu_identifier_attrs,
+};
+
+const char *
+coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu)
+{
+	const char *identifier =
+		devm_kasprintf(coresight_pmu->dev, GFP_KERNEL, "%x",
+			       coresight_pmu->impl.pmiidr);
+	return identifier;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_identifier);
+
+static const char *coresight_pmu_type_str[ACPI_APMT_NODE_TYPE_COUNT] = {
+    "mc",
+    "smmu",
+    "pcie",
+    "acpi",
+    "cache",
+};
+
+const char *coresight_pmu_get_name(const struct coresight_pmu *coresight_pmu)
+{
+	struct device *dev;
+	u8 pmu_type;
+	char *name;
+	char acpi_hid_string[ACPI_ID_LEN] = { 0 };
+	static atomic_t pmu_idx[ACPI_APMT_NODE_TYPE_COUNT] = { 0 };
+
+	dev = coresight_pmu->dev;
+	pmu_type = coresight_pmu->apmt_node->type;
+
+	if (pmu_type >= ACPI_APMT_NODE_TYPE_COUNT) {
+		dev_err(dev, "unsupported PMU type-%u\n", pmu_type);
+		return NULL;
+	}
+
+	if (pmu_type == ACPI_APMT_NODE_TYPE_ACPI) {
+		memcpy(acpi_hid_string,
+			&coresight_pmu->apmt_node->inst_primary,
+			sizeof(coresight_pmu->apmt_node->inst_primary));
+		name = devm_kasprintf(dev, GFP_KERNEL, "arm_%s_pmu_%s_%u",
+				      coresight_pmu_type_str[pmu_type],
+				      acpi_hid_string,
+				      coresight_pmu->apmt_node->inst_secondary);
+	} else {
+		name = devm_kasprintf(dev, GFP_KERNEL, "arm_%s_pmu_%d",
+				      coresight_pmu_type_str[pmu_type],
+				      atomic_fetch_inc(&pmu_idx[pmu_type]));
+	}
+
+	return name;
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_get_name);
+
+static ssize_t coresight_pmu_cpumask_show(struct device *dev,
+					  struct device_attribute *attr,
+					  char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case CORESIGHT_ACTIVE_CPU_MASK:
+		cpumask = &coresight_pmu->active_cpu;
+		break;
+	case CORESIGHT_ASSOCIATED_CPU_MASK:
+		cpumask = &coresight_pmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+static struct attribute *coresight_pmu_cpumask_attrs[] = {
+	CORESIGHT_CPUMASK_ATTR(cpumask, CORESIGHT_ACTIVE_CPU_MASK),
+	CORESIGHT_CPUMASK_ATTR(associated_cpus, CORESIGHT_ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static struct attribute_group coresight_pmu_cpumask_attr_group = {
+	.attrs = coresight_pmu_cpumask_attrs,
+};
+
+static const struct coresight_pmu_impl_ops default_impl_ops = {
+	.get_event_attrs	= coresight_pmu_get_event_attrs,
+	.get_format_attrs	= coresight_pmu_get_format_attrs,
+	.get_identifier		= coresight_pmu_get_identifier,
+	.get_name		= coresight_pmu_get_name,
+	.is_cc_event		= coresight_pmu_is_cc_event,
+	.event_type		= coresight_pmu_event_type,
+	.event_filter		= coresight_pmu_event_filter,
+	.event_attr_is_visible	= coresight_pmu_event_attr_is_visible
+};
+
+struct impl_match {
+	u32 pmiidr;
+	u32 mask;
+	int (*impl_init_ops)(struct coresight_pmu *coresight_pmu);
+};
+
+static const struct impl_match impl_match[] = {
+	{}
+};
+
+static int coresight_pmu_init_impl_ops(struct coresight_pmu *coresight_pmu)
+{
+	int idx, ret;
+	struct acpi_apmt_node *apmt_node = coresight_pmu->apmt_node;
+	const struct impl_match *match = impl_match;
+
+	/*
+	 * Get PMU implementer and product id from APMT node.
+	 * If APMT node doesn't have implementer/product id, try get it
+	 * from PMIIDR.
+	 */
+	coresight_pmu->impl.pmiidr =
+		(apmt_node->impl_id) ? apmt_node->impl_id :
+				       readl(coresight_pmu->base0 + PMIIDR);
+
+	/* Find implementer specific attribute ops. */
+	for (idx = 0; match->pmiidr; match++, idx++) {
+		if ((match->pmiidr & match->mask) ==
+		    (coresight_pmu->impl.pmiidr & match->mask)) {
+			ret = match->impl_init_ops(coresight_pmu);
+			if (ret)
+				return ret;
+
+			return 0;
+		}
+	}
+
+	/* We don't find implementer specific attribute ops, use default. */
+	coresight_pmu->impl.ops = &default_impl_ops;
+	return 0;
+}
+
+static struct attribute_group *
+coresight_pmu_alloc_event_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	struct attribute_group *event_group;
+	struct device *dev = coresight_pmu->dev;
+
+	event_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!event_group)
+		return NULL;
+
+	event_group->name = "events";
+	event_group->attrs =
+		coresight_pmu->impl.ops->get_event_attrs(coresight_pmu);
+	event_group->is_visible =
+		coresight_pmu->impl.ops->event_attr_is_visible;
+
+	return event_group;
+}
+
+static struct attribute_group *
+coresight_pmu_alloc_format_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	struct attribute_group *format_group;
+	struct device *dev = coresight_pmu->dev;
+
+	format_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!format_group)
+		return NULL;
+
+	format_group->name = "format";
+	format_group->attrs =
+		coresight_pmu->impl.ops->get_format_attrs(coresight_pmu);
+
+	return format_group;
+}
+
+static struct attribute_group **
+coresight_pmu_alloc_attr_group(struct coresight_pmu *coresight_pmu)
+{
+	const struct coresight_pmu_impl_ops *impl_ops;
+	struct attribute_group **attr_groups = NULL;
+	struct device *dev = coresight_pmu->dev;
+	int ret;
+
+	ret = coresight_pmu_init_impl_ops(coresight_pmu);
+	if (ret)
+		return NULL;
+
+	impl_ops = coresight_pmu->impl.ops;
+
+	coresight_pmu->identifier = impl_ops->get_identifier(coresight_pmu);
+	coresight_pmu->name = impl_ops->get_name(coresight_pmu);
+
+	if (!coresight_pmu->name)
+		return NULL;
+
+	attr_groups = devm_kzalloc(dev, 5 * sizeof(struct attribute_group *),
+				   GFP_KERNEL);
+	if (!attr_groups)
+		return NULL;
+
+	attr_groups[0] = coresight_pmu_alloc_event_attr_group(coresight_pmu);
+	attr_groups[1] = coresight_pmu_alloc_format_attr_group(coresight_pmu);
+	attr_groups[2] = &coresight_pmu_identifier_attr_group;
+	attr_groups[3] = &coresight_pmu_cpumask_attr_group;
+
+	return attr_groups;
+}
+
+static inline void
+coresight_pmu_reset_counters(struct coresight_pmu *coresight_pmu)
+{
+	u32 pmcr = 0;
+
+	pmcr |= PMCR_P;
+	pmcr |= PMCR_C;
+	writel(pmcr, coresight_pmu->base0 + PMCR);
+}
+
+static inline void
+coresight_pmu_start_counters(struct coresight_pmu *coresight_pmu)
+{
+	u32 pmcr;
+
+	pmcr = PMCR_E;
+	writel(pmcr, coresight_pmu->base0 + PMCR);
+}
+
+static inline void
+coresight_pmu_stop_counters(struct coresight_pmu *coresight_pmu)
+{
+	u32 pmcr;
+
+	pmcr = 0;
+	writel(pmcr, coresight_pmu->base0 + PMCR);
+}
+
+static void coresight_pmu_enable(struct pmu *pmu)
+{
+	bool disabled;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+
+	disabled = bitmap_empty(coresight_pmu->hw_events.used_ctrs,
+				coresight_pmu->num_logical_counters);
+
+	if (disabled)
+		return;
+
+	coresight_pmu_start_counters(coresight_pmu);
+}
+
+static void coresight_pmu_disable(struct pmu *pmu)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(pmu);
+
+	coresight_pmu_stop_counters(coresight_pmu);
+}
+
+bool coresight_pmu_is_cc_event(const struct perf_event *event)
+{
+	return (event->attr.config == CORESIGHT_PMU_EVT_CYCLES_DEFAULT);
+}
+EXPORT_SYMBOL_GPL(coresight_pmu_is_cc_event);
+
+static int
+coresight_pmu_get_event_idx(struct coresight_pmu_hw_events *hw_events,
+			    struct perf_event *event)
+{
+	int idx;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (support_cc(coresight_pmu)) {
+		if (coresight_pmu->impl.ops->is_cc_event(event)) {
+			/* Search for available cycle counter. */
+			if (test_and_set_bit(coresight_pmu->cc_logical_idx,
+					     hw_events->used_ctrs))
+				return -EAGAIN;
+
+			return coresight_pmu->cc_logical_idx;
+		}
+
+		/*
+		 * Search a regular counter from the used counter bitmap.
+		 * The cycle counter divides the bitmap into two parts. Search
+		 * the first then second half to exclude the cycle counter bit.
+		 */
+		idx = find_first_zero_bit(hw_events->used_ctrs,
+					  coresight_pmu->cc_logical_idx);
+		if (idx >= coresight_pmu->cc_logical_idx) {
+			idx = find_next_zero_bit(
+				hw_events->used_ctrs,
+				coresight_pmu->num_logical_counters,
+				coresight_pmu->cc_logical_idx + 1);
+		}
+	} else {
+		idx = find_first_zero_bit(hw_events->used_ctrs,
+					  coresight_pmu->num_logical_counters);
+	}
+
+	if (idx >= coresight_pmu->num_logical_counters)
+		return -EAGAIN;
+
+	set_bit(idx, hw_events->used_ctrs);
+
+	return idx;
+}
+
+static bool
+coresight_pmu_validate_event(struct pmu *pmu,
+			     struct coresight_pmu_hw_events *hw_events,
+			     struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return false;
+
+	return (coresight_pmu_get_event_idx(hw_events, event) >= 0);
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool coresight_pmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct coresight_pmu_hw_events fake_hw_events;
+
+	if (event->group_leader == event)
+		return true;
+
+	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
+
+	if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events, leader))
+		return false;
+
+	for_each_sibling_event(sibling, leader) {
+		if (!coresight_pmu_validate_event(event->pmu, &fake_hw_events,
+						  sibling))
+			return false;
+	}
+
+	return coresight_pmu_validate_event(event->pmu, &fake_hw_events, event);
+}
+
+static int coresight_pmu_event_init(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu;
+	struct hw_perf_event *hwc = &event->hw;
+
+	coresight_pmu = to_coresight_pmu(event->pmu);
+
+	/*
+	 * Following other "uncore" PMUs, we do not support sampling mode or
+	 * attach to a task (per-process mode).
+	 */
+	if (is_sampling_event(event)) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Make sure the CPU assignment is on one of the CPUs associated with
+	 * this PMU.
+	 */
+	if (!cpumask_test_cpu(event->cpu, &coresight_pmu->associated_cpus)) {
+		dev_dbg(coresight_pmu->pmu.dev,
+			"Requested cpu is not associated with the PMU\n");
+		return -EINVAL;
+	}
+
+	/* Enforce the current active CPU to handle the events in this PMU. */
+	event->cpu = cpumask_first(&coresight_pmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (!coresight_pmu_validate_group(event))
+		return -EINVAL;
+
+	/*
+	 * The logical counter id is tracked with hw_perf_event.extra_reg.idx.
+	 * The physical counter id is tracked with hw_perf_event.idx.
+	 * We don't assign an index until we actually place the event onto
+	 * hardware. Use -1 to signify that we haven't decided where to put it
+	 * yet.
+	 */
+	hwc->idx = -1;
+	hwc->extra_reg.idx = -1;
+	hwc->config_base = coresight_pmu->impl.ops->event_type(event);
+
+	return 0;
+}
+
+static inline u32 counter_offset(u32 reg_sz, u32 ctr_idx)
+{
+	return (PMEVCNTR_LO + (reg_sz * ctr_idx));
+}
+
+static void coresight_pmu_write_counter(struct perf_event *event, u64 val)
+{
+	u32 offset;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (use_64b_counter_reg(coresight_pmu)) {
+		offset = counter_offset(sizeof(u64), event->hw.idx);
+
+		writeq(val, coresight_pmu->base1 + offset);
+	} else {
+		offset = counter_offset(sizeof(u32), event->hw.idx);
+
+		writel(lower_32_bits(val), coresight_pmu->base1 + offset);
+	}
+}
+
+static u64 coresight_pmu_read_counter(struct perf_event *event)
+{
+	u32 offset;
+	const void __iomem *counter_addr;
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+
+	if (use_64b_counter_reg(coresight_pmu)) {
+		offset = counter_offset(sizeof(u64), event->hw.idx);
+		counter_addr = coresight_pmu->base1 + offset;
+
+		return support_atomic(coresight_pmu) ?
+			       readq(counter_addr) :
+			       read_reg64_hilohi(counter_addr);
+	}
+
+	offset = counter_offset(sizeof(u32), event->hw.idx);
+	return readl(coresight_pmu->base1 + offset);
+}
+
+/*
+ * coresight_pmu_set_event_period: Set the period for the counter.
+ *
+ * To handle cases of extreme interrupt latency, we program
+ * the counter with half of the max count for the counters.
+ */
+static void coresight_pmu_set_event_period(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	u64 val = GENMASK_ULL(pmcfgr_size(coresight_pmu), 0) >> 1;
+
+	local64_set(&event->hw.prev_count, val);
+	coresight_pmu_write_counter(event, val);
+}
+
+static void coresight_pmu_enable_counter(struct coresight_pmu *coresight_pmu,
+					 int idx)
+{
+	u32 reg_id, reg_bit, inten_off, cnten_off;
+
+	reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
+	reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
+
+	inten_off = PMINTENSET + (4 * reg_id);
+	cnten_off = PMCNTENSET + (4 * reg_id);
+
+	writel(BIT(reg_bit), coresight_pmu->base0 + inten_off);
+	writel(BIT(reg_bit), coresight_pmu->base0 + cnten_off);
+}
+
+static void coresight_pmu_disable_counter(struct coresight_pmu *coresight_pmu,
+					  int idx)
+{
+	u32 reg_id, reg_bit, inten_off, cnten_off;
+
+	reg_id = CORESIGHT_IDX_TO_SET_CLR_REG_ID(idx);
+	reg_bit = CORESIGHT_IDX_TO_SET_CLR_REG_BIT(idx);
+
+	inten_off = PMINTENCLR + (4 * reg_id);
+	cnten_off = PMCNTENCLR + (4 * reg_id);
+
+	writel(BIT(reg_bit), coresight_pmu->base0 + cnten_off);
+	writel(BIT(reg_bit), coresight_pmu->base0 + inten_off);
+}
+
+static void coresight_pmu_event_update(struct perf_event *event)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u64 delta, prev, now;
+
+	do {
+		prev = local64_read(&hwc->prev_count);
+		now = coresight_pmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
+
+	delta = (now - prev) & GENMASK_ULL(pmcfgr_size(coresight_pmu), 0);
+	local64_add(delta, &event->count);
+}
+
+static inline void coresight_pmu_set_event(struct coresight_pmu *coresight_pmu,
+					   struct hw_perf_event *hwc)
+{
+	u32 offset = PMEVTYPER + (4 * hwc->idx);
+
+	writel(hwc->config_base, coresight_pmu->base0 + offset);
+}
+
+static inline void
+coresight_pmu_set_ev_filter(struct coresight_pmu *coresight_pmu,
+			    struct hw_perf_event *hwc, u32 filter)
+{
+	u32 offset = PMEVFILTR + (4 * hwc->idx);
+
+	writel(filter, coresight_pmu->base0 + offset);
+}
+
+static inline void
+coresight_pmu_set_cc_filter(struct coresight_pmu *coresight_pmu, u32 filter)
+{
+	u32 offset = PMCCFILTR;
+
+	writel(filter, coresight_pmu->base0 + offset);
+}
+
+static void coresight_pmu_start(struct perf_event *event, int pmu_flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u32 filter;
+
+	/* We always reprogram the counter */
+	if (pmu_flags & PERF_EF_RELOAD)
+		WARN_ON(!(hwc->state & PERF_HES_UPTODATE));
+
+	coresight_pmu_set_event_period(event);
+
+	filter = coresight_pmu->impl.ops->event_filter(event);
+
+	if (event->hw.extra_reg.idx == coresight_pmu->cc_logical_idx) {
+		coresight_pmu_set_cc_filter(coresight_pmu, filter);
+	} else {
+		coresight_pmu_set_event(coresight_pmu, hwc);
+		coresight_pmu_set_ev_filter(coresight_pmu, hwc, filter);
+	}
+
+	hwc->state = 0;
+
+	coresight_pmu_enable_counter(coresight_pmu, hwc->idx);
+}
+
+static void coresight_pmu_stop(struct perf_event *event, int pmu_flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->state & PERF_HES_STOPPED)
+		return;
+
+	coresight_pmu_disable_counter(coresight_pmu, hwc->idx);
+	coresight_pmu_event_update(event);
+
+	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static inline u32 to_phys_idx(struct coresight_pmu *coresight_pmu, u32 idx)
+{
+	return (idx == coresight_pmu->cc_logical_idx) ?
+		       CORESIGHT_PMU_IDX_CCNTR : idx;
+}
+
+static int coresight_pmu_add(struct perf_event *event, int flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct coresight_pmu_hw_events *hw_events = &coresight_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &coresight_pmu->associated_cpus)))
+		return -ENOENT;
+
+	idx = coresight_pmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hw_events->events[idx] = event;
+	hwc->idx = to_phys_idx(coresight_pmu, idx);
+	hwc->extra_reg.idx = idx;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		coresight_pmu_start(event, PERF_EF_RELOAD);
+
+	/* Propagate changes to the userspace mapping. */
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static void coresight_pmu_del(struct perf_event *event, int flags)
+{
+	struct coresight_pmu *coresight_pmu = to_coresight_pmu(event->pmu);
+	struct coresight_pmu_hw_events *hw_events = &coresight_pmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->extra_reg.idx;
+
+	coresight_pmu_stop(event, PERF_EF_UPDATE);
+
+	hw_events->events[idx] = NULL;
+
+	clear_bit(idx, hw_events->used_ctrs);
+
+	perf_event_update_userpage(event);
+}
+
+static void coresight_pmu_read(struct perf_event *event)
+{
+	coresight_pmu_event_update(event);
+}
+
+static int coresight_pmu_alloc(struct platform_device *pdev,
+			       struct coresight_pmu **coresight_pmu)
+{
+	struct acpi_apmt_node *apmt_node;
+	struct device *dev;
+	struct coresight_pmu *pmu;
+
+	dev = &pdev->dev;
+	apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
+	if (!apmt_node) {
+		dev_err(dev, "failed to get APMT node\n");
+		return -ENOMEM;
+	}
+
+	pmu = devm_kzalloc(dev, sizeof(*pmu), GFP_KERNEL);
+	if (!pmu)
+		return -ENOMEM;
+
+	*coresight_pmu = pmu;
+
+	pmu->dev = dev;
+	pmu->apmt_node = apmt_node;
+
+	platform_set_drvdata(pdev, coresight_pmu);
+
+	return 0;
+}
+
+static int coresight_pmu_init_mmio(struct coresight_pmu *coresight_pmu)
+{
+	struct device *dev;
+	struct platform_device *pdev;
+	struct acpi_apmt_node *apmt_node;
+
+	dev = coresight_pmu->dev;
+	pdev = to_platform_device(dev);
+	apmt_node = coresight_pmu->apmt_node;
+
+	/* Base address for page 0. */
+	coresight_pmu->base0 = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(coresight_pmu->base0)) {
+		dev_err(dev, "ioremap failed for page-0 resource\n");
+		return PTR_ERR(coresight_pmu->base0);
+	}
+
+	/* Base address for page 1 if supported. Otherwise point it to page 0. */
+	coresight_pmu->base1 = coresight_pmu->base0;
+	if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
+		coresight_pmu->base1 = devm_platform_ioremap_resource(pdev, 1);
+		if (IS_ERR(coresight_pmu->base1)) {
+			dev_err(dev, "ioremap failed for page-1 resource\n");
+			return PTR_ERR(coresight_pmu->base1);
+		}
+	}
+
+	coresight_pmu->pmcfgr = readl(coresight_pmu->base0 + PMCFGR);
+
+	coresight_pmu->num_logical_counters = pmcfgr_n(coresight_pmu) + 1;
+
+	coresight_pmu->cc_logical_idx = CORESIGHT_PMU_MAX_HW_CNTRS;
+
+	if (support_cc(coresight_pmu)) {
+		/*
+		 * The last logical counter is mapped to cycle counter if
+		 * there is a gap between regular and cycle counter. Otherwise,
+		 * logical and physical have 1-to-1 mapping.
+		 */
+		coresight_pmu->cc_logical_idx =
+			(coresight_pmu->num_logical_counters <=
+			 CORESIGHT_PMU_IDX_CCNTR) ?
+				coresight_pmu->num_logical_counters - 1 :
+				CORESIGHT_PMU_IDX_CCNTR;
+	}
+
+	coresight_pmu->num_set_clr_reg =
+		DIV_ROUND_UP(coresight_pmu->num_logical_counters,
+			 CORESIGHT_SET_CLR_REG_COUNTER_NUM);
+
+	coresight_pmu->hw_events.events =
+		devm_kzalloc(dev,
+			     sizeof(*coresight_pmu->hw_events.events) *
+				     coresight_pmu->num_logical_counters,
+			     GFP_KERNEL);
+
+	if (!coresight_pmu->hw_events.events)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static inline int
+coresight_pmu_get_reset_overflow(struct coresight_pmu *coresight_pmu,
+				 u32 *pmovs)
+{
+	int i;
+	u32 pmovclr_offset = PMOVSCLR;
+	u32 has_overflowed = 0;
+
+	for (i = 0; i < coresight_pmu->num_set_clr_reg; ++i) {
+		pmovs[i] = readl(coresight_pmu->base1 + pmovclr_offset);
+		has_overflowed |= pmovs[i];
+		writel(pmovs[i], coresight_pmu->base1 + pmovclr_offset);
+		pmovclr_offset += sizeof(u32);
+	}
+
+	return has_overflowed != 0;
+}
+
+static irqreturn_t coresight_pmu_handle_irq(int irq_num, void *dev)
+{
+	int idx, has_overflowed;
+	struct perf_event *event;
+	struct coresight_pmu *coresight_pmu = dev;
+	u32 pmovs[CORESIGHT_SET_CLR_REG_MAX_NUM] = { 0 };
+	bool handled = false;
+
+	coresight_pmu_stop_counters(coresight_pmu);
+
+	has_overflowed = coresight_pmu_get_reset_overflow(coresight_pmu, pmovs);
+	if (!has_overflowed)
+		goto done;
+
+	for_each_set_bit(idx, coresight_pmu->hw_events.used_ctrs,
+			coresight_pmu->num_logical_counters) {
+		event = coresight_pmu->hw_events.events[idx];
+
+		if (!event)
+			continue;
+
+		if (!test_bit(event->hw.idx, (unsigned long *)pmovs))
+			continue;
+
+		coresight_pmu_event_update(event);
+		coresight_pmu_set_event_period(event);
+
+		handled = true;
+	}
+
+done:
+	coresight_pmu_start_counters(coresight_pmu);
+	return IRQ_RETVAL(handled);
+}
+
+static int coresight_pmu_request_irq(struct coresight_pmu *coresight_pmu)
+{
+	int irq, ret;
+	struct device *dev;
+	struct platform_device *pdev;
+	struct acpi_apmt_node *apmt_node;
+
+	dev = coresight_pmu->dev;
+	pdev = to_platform_device(dev);
+	apmt_node = coresight_pmu->apmt_node;
+
+	/* Skip IRQ request if the PMU does not support overflow interrupt. */
+	if (apmt_node->ovflw_irq == 0)
+		return 0;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0)
+		return irq;
+
+	ret = devm_request_irq(dev, irq, coresight_pmu_handle_irq,
+			       IRQF_NOBALANCING | IRQF_NO_THREAD, dev_name(dev),
+			       coresight_pmu);
+	if (ret) {
+		dev_err(dev, "Could not request IRQ %d\n", irq);
+		return ret;
+	}
+
+	coresight_pmu->irq = irq;
+
+	return 0;
+}
+
+static inline int coresight_pmu_find_cpu_container(int cpu, u32 container_uid)
+{
+	u32 acpi_uid;
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct acpi_device *acpi_dev = ACPI_COMPANION(cpu_dev);
+
+	if (!cpu_dev)
+		return -ENODEV;
+
+	while (acpi_dev) {
+		if (!strcmp(acpi_device_hid(acpi_dev),
+			    ACPI_PROCESSOR_CONTAINER_HID) &&
+		    !kstrtouint(acpi_device_uid(acpi_dev), 0, &acpi_uid) &&
+		    acpi_uid == container_uid)
+			return 0;
+
+		acpi_dev = acpi_dev->parent;
+	}
+
+	return -ENODEV;
+}
+
+static int coresight_pmu_get_cpus(struct coresight_pmu *coresight_pmu)
+{
+	struct acpi_apmt_node *apmt_node;
+	int affinity_flag;
+	int cpu;
+
+	apmt_node = coresight_pmu->apmt_node;
+	affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
+
+	if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
+		for_each_possible_cpu(cpu) {
+			if (apmt_node->proc_affinity ==
+			    get_acpi_id_for_cpu(cpu)) {
+				cpumask_set_cpu(
+					cpu, &coresight_pmu->associated_cpus);
+				break;
+			}
+		}
+	} else {
+		for_each_possible_cpu(cpu) {
+			if (coresight_pmu_find_cpu_container(
+				    cpu, apmt_node->proc_affinity))
+				continue;
+
+			cpumask_set_cpu(cpu, &coresight_pmu->associated_cpus);
+		}
+	}
+
+	return 0;
+}
+
+static int coresight_pmu_register_pmu(struct coresight_pmu *coresight_pmu)
+{
+	int ret;
+	struct attribute_group **attr_groups;
+
+	attr_groups = coresight_pmu_alloc_attr_group(coresight_pmu);
+	if (!attr_groups) {
+		ret = -ENOMEM;
+		return ret;
+	}
+
+	ret = cpuhp_state_add_instance(coresight_pmu_cpuhp_state,
+				       &coresight_pmu->cpuhp_node);
+	if (ret)
+		return ret;
+
+	coresight_pmu->pmu = (struct pmu){
+		.task_ctx_nr	= perf_invalid_context,
+		.module		= THIS_MODULE,
+		.pmu_enable	= coresight_pmu_enable,
+		.pmu_disable	= coresight_pmu_disable,
+		.event_init	= coresight_pmu_event_init,
+		.add		= coresight_pmu_add,
+		.del		= coresight_pmu_del,
+		.start		= coresight_pmu_start,
+		.stop		= coresight_pmu_stop,
+		.read		= coresight_pmu_read,
+		.attr_groups	= (const struct attribute_group **)attr_groups,
+		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
+	};
+
+	/* Hardware counter init */
+	coresight_pmu_stop_counters(coresight_pmu);
+	coresight_pmu_reset_counters(coresight_pmu);
+
+	ret = perf_pmu_register(&coresight_pmu->pmu, coresight_pmu->name, -1);
+	if (ret) {
+		cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
+					    &coresight_pmu->cpuhp_node);
+	}
+
+	return ret;
+}
+
+static int coresight_pmu_device_probe(struct platform_device *pdev)
+{
+	int ret;
+	struct coresight_pmu *coresight_pmu;
+
+	ret = coresight_pmu_alloc(pdev, &coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_init_mmio(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_request_irq(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_get_cpus(coresight_pmu);
+	if (ret)
+		return ret;
+
+	ret = coresight_pmu_register_pmu(coresight_pmu);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int coresight_pmu_device_remove(struct platform_device *pdev)
+{
+	struct coresight_pmu *coresight_pmu = platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&coresight_pmu->pmu);
+	cpuhp_state_remove_instance(coresight_pmu_cpuhp_state,
+				    &coresight_pmu->cpuhp_node);
+
+	return 0;
+}
+
+static struct platform_driver coresight_pmu_driver = {
+	.driver = {
+			.name = "arm-system-pmu",
+			.suppress_bind_attrs = true,
+		},
+	.probe = coresight_pmu_device_probe,
+	.remove = coresight_pmu_device_remove,
+};
+
+static void coresight_pmu_set_active_cpu(int cpu,
+					 struct coresight_pmu *coresight_pmu)
+{
+	cpumask_set_cpu(cpu, &coresight_pmu->active_cpu);
+	WARN_ON(irq_set_affinity(coresight_pmu->irq,
+				 &coresight_pmu->active_cpu));
+}
+
+static int coresight_pmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct coresight_pmu *coresight_pmu =
+		hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
+
+	if (!cpumask_test_cpu(cpu, &coresight_pmu->associated_cpus))
+		return 0;
+
+	/* If the PMU is already managed, there is nothing to do */
+	if (!cpumask_empty(&coresight_pmu->active_cpu))
+		return 0;
+
+	/* Use this CPU for event counting */
+	coresight_pmu_set_active_cpu(cpu, coresight_pmu);
+
+	return 0;
+}
+
+static int coresight_pmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	int dst;
+	struct cpumask online_supported;
+
+	struct coresight_pmu *coresight_pmu =
+		hlist_entry_safe(node, struct coresight_pmu, cpuhp_node);
+
+	/* Nothing to do if this CPU doesn't own the PMU */
+	if (!cpumask_test_and_clear_cpu(cpu, &coresight_pmu->active_cpu))
+		return 0;
+
+	/* Choose a new CPU to migrate ownership of the PMU to */
+	cpumask_and(&online_supported, &coresight_pmu->associated_cpus,
+		    cpu_online_mask);
+	dst = cpumask_any_but(&online_supported, cpu);
+	if (dst >= nr_cpu_ids)
+		return 0;
+
+	/* Use this CPU for event counting */
+	perf_pmu_migrate_context(&coresight_pmu->pmu, cpu, dst);
+	coresight_pmu_set_active_cpu(dst, coresight_pmu);
+
+	return 0;
+}
+
+static int __init coresight_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN, PMUNAME,
+				      coresight_pmu_cpu_online,
+				      coresight_pmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+	coresight_pmu_cpuhp_state = ret;
+	return platform_driver_register(&coresight_pmu_driver);
+}
+
+static void __exit coresight_pmu_exit(void)
+{
+	platform_driver_unregister(&coresight_pmu_driver);
+	cpuhp_remove_multi_state(coresight_pmu_cpuhp_state);
+}
+
+module_init(coresight_pmu_init);
+module_exit(coresight_pmu_exit);
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.h b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
new file mode 100644
index 000000000000..88fb4cd3dafa
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.h
@@ -0,0 +1,177 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * ARM CoreSight PMU driver.
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+#ifndef __ARM_CORESIGHT_PMU_H__
+#define __ARM_CORESIGHT_PMU_H__
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#define to_coresight_pmu(p) (container_of(p, struct coresight_pmu, pmu))
+
+#define CORESIGHT_EXT_ATTR(_name, _func, _config)			\
+	(&((struct dev_ext_attribute[]){				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define CORESIGHT_FORMAT_ATTR(_name, _config)				\
+	CORESIGHT_EXT_ATTR(_name, coresight_pmu_sysfs_format_show,	\
+			   (char *)_config)
+
+#define CORESIGHT_EVENT_ATTR(_name, _config)				\
+	PMU_EVENT_ATTR_ID(_name, coresight_pmu_sysfs_event_show, _config)
+
+
+/* Default event id mask */
+#define CORESIGHT_EVENT_MASK				0xFFFFFFFFULL
+
+/* Default filter value mask */
+#define CORESIGHT_FILTER_MASK				0xFFFFFFFFULL
+
+/* Default event format */
+#define CORESIGHT_FORMAT_EVENT_ATTR CORESIGHT_FORMAT_ATTR(event, "config:0-32")
+
+/* Default filter format */
+#define CORESIGHT_FORMAT_FILTER_ATTR                                           \
+	CORESIGHT_FORMAT_ATTR(filter, "config1:0-31")
+
+/*
+ * This is the default event number for cycle count, if supported, since the
+ * ARM Coresight PMU specification does not define a standard event code
+ * for cycle count.
+ */
+#define CORESIGHT_PMU_EVT_CYCLES_DEFAULT (0x1ULL << 32)
+
+/*
+ * The ARM Coresight PMU supports up to 256 event counters.
+ * If the counters are larger-than 32-bits, then the PMU includes at
+ * most 128 counters.
+ */
+#define CORESIGHT_PMU_MAX_HW_CNTRS 256
+
+/* The cycle counter, if implemented, is located at counter[31]. */
+#define CORESIGHT_PMU_IDX_CCNTR 31
+
+struct coresight_pmu;
+
+/* This tracks the events assigned to each counter in the PMU. */
+struct coresight_pmu_hw_events {
+	/* The events that are active on the PMU for a given logical index. */
+	struct perf_event **events;
+
+	/*
+	 * Each bit indicates a logical counter is being used (or not) for an
+	 * event. If cycle counter is supported and there is a gap between
+	 * regular and cycle counter, the last logical counter is mapped to
+	 * cycle counter. Otherwise, logical and physical have 1-to-1 mapping.
+	 */
+	DECLARE_BITMAP(used_ctrs, CORESIGHT_PMU_MAX_HW_CNTRS);
+};
+
+/* Contains ops to query vendor/implementer specific attribute. */
+struct coresight_pmu_impl_ops {
+	/* Get event attributes */
+	struct attribute **(*get_event_attrs)(
+		const struct coresight_pmu *coresight_pmu);
+	/* Get format attributes */
+	struct attribute **(*get_format_attrs)(
+		const struct coresight_pmu *coresight_pmu);
+	/* Get string identifier */
+	const char *(*get_identifier)(const struct coresight_pmu *coresight_pmu);
+	/* Get PMU name to register to core perf */
+	const char *(*get_name)(const struct coresight_pmu *coresight_pmu);
+	/* Check if the event corresponds to cycle count event */
+	bool (*is_cc_event)(const struct perf_event *event);
+	/* Decode event type/id from configs */
+	u32 (*event_type)(const struct perf_event *event);
+	/* Decode filter value from configs */
+	u32 (*event_filter)(const struct perf_event *event);
+	/* Hide/show unsupported events */
+	umode_t (*event_attr_is_visible)(struct kobject *kobj,
+					 struct attribute *attr, int unused);
+};
+
+/* Vendor/implementer descriptor. */
+struct coresight_pmu_impl {
+	u32 pmiidr;
+	const struct coresight_pmu_impl_ops *ops;
+};
+
+/* Coresight PMU descriptor. */
+struct coresight_pmu {
+	struct pmu pmu;
+	struct device *dev;
+	struct acpi_apmt_node *apmt_node;
+	const char *name;
+	const char *identifier;
+	void __iomem *base0;
+	void __iomem *base1;
+	int irq;
+	cpumask_t associated_cpus;
+	cpumask_t active_cpu;
+	struct hlist_node cpuhp_node;
+
+	u32 pmcfgr;
+	u32 num_logical_counters;
+	u32 num_set_clr_reg;
+	int cc_logical_idx;
+
+	struct coresight_pmu_hw_events hw_events;
+
+	struct coresight_pmu_impl impl;
+};
+
+/* Default function to show event attribute in sysfs. */
+ssize_t coresight_pmu_sysfs_event_show(struct device *dev,
+				       struct device_attribute *attr,
+				       char *buf);
+
+/* Default function to show format attribute in sysfs. */
+ssize_t coresight_pmu_sysfs_format_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf);
+
+/* Get the default Coresight PMU event attributes. */
+struct attribute **
+coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu);
+
+/* Get the default Coresight PMU format attributes. */
+struct attribute **
+coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu);
+
+/* Get the default Coresight PMU device identifier. */
+const char *
+coresight_pmu_get_identifier(const struct coresight_pmu *coresight_pmu);
+
+/* Get the default PMU name. */
+const char *
+coresight_pmu_get_name(const struct coresight_pmu *coresight_pmu);
+
+/* Default function to query if an event is a cycle counter event. */
+bool coresight_pmu_is_cc_event(const struct perf_event *event);
+
+/* Default function to query the type/id of an event. */
+u32 coresight_pmu_event_type(const struct perf_event *event);
+
+/* Default function to query the filter value of an event. */
+u32 coresight_pmu_event_filter(const struct perf_event *event);
+
+/* Default function that hides (default) cycle event id if not supported. */
+umode_t coresight_pmu_event_attr_is_visible(struct kobject *kobj,
+					    struct attribute *attr, int unused);
+
+#endif /* __ARM_CORESIGHT_PMU_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v3 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
  2022-05-25  6:48   ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2022-05-25  6:48     ` [PATCH v3 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
@ 2022-05-25  6:48     ` Besar Wicaksono
  2022-06-03 15:47     ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2022-06-14 10:19     ` Besar Wicaksono
  3 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-05-25  6:48 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	Besar Wicaksono

Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
Fabric (MCF) PMU attributes for CoreSight PMU implementation in
NVIDIA devices.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 drivers/perf/coresight_pmu/Makefile           |   3 +-
 .../perf/coresight_pmu/arm_coresight_pmu.c    |   4 +
 .../coresight_pmu/arm_coresight_pmu_nvidia.c  | 312 ++++++++++++++++++
 .../coresight_pmu/arm_coresight_pmu_nvidia.h  |  17 +
 4 files changed, 335 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
 create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h

diff --git a/drivers/perf/coresight_pmu/Makefile b/drivers/perf/coresight_pmu/Makefile
index a2a7a5fbbc16..181b1b0dbaa1 100644
--- a/drivers/perf/coresight_pmu/Makefile
+++ b/drivers/perf/coresight_pmu/Makefile
@@ -3,4 +3,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_ARM_CORESIGHT_PMU) += \
-	arm_coresight_pmu.o
+	arm_coresight_pmu.o \
+	arm_coresight_pmu_nvidia.o
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu.c b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
index ba52cc592b2d..12179d029bfd 100644
--- a/drivers/perf/coresight_pmu/arm_coresight_pmu.c
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu.c
@@ -42,6 +42,7 @@
 #include <acpi/processor.h>
 
 #include "arm_coresight_pmu.h"
+#include "arm_coresight_pmu_nvidia.h"
 
 #define PMUNAME "arm_system_pmu"
 
@@ -396,6 +397,9 @@ struct impl_match {
 };
 
 static const struct impl_match impl_match[] = {
+	{ .pmiidr = 0x36B,
+	  .mask = PMIIDR_IMPLEMENTER_MASK,
+	  .impl_init_ops = nv_coresight_init_ops },
 	{}
 };
 
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
new file mode 100644
index 000000000000..54f4eae4c529
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
@@ -0,0 +1,312 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+/* Support for NVIDIA specific attributes. */
+
+#include "arm_coresight_pmu_nvidia.h"
+
+#define NV_MCF_PCIE_PORT_COUNT		10ULL
+#define NV_MCF_PCIE_FILTER_ID_MASK	((1ULL << NV_MCF_PCIE_PORT_COUNT) - 1)
+
+#define NV_MCF_GPU_PORT_COUNT		2ULL
+#define NV_MCF_GPU_FILTER_ID_MASK	((1ULL << NV_MCF_GPU_PORT_COUNT) - 1)
+
+#define NV_MCF_NVLINK_PORT_COUNT	4ULL
+#define NV_MCF_NVLINK_FILTER_ID_MASK	((1ULL << NV_MCF_NVLINK_PORT_COUNT) - 1)
+
+#define PMIIDR_PRODUCTID_MASK		0xFFF
+#define PMIIDR_PRODUCTID_SHIFT		20
+
+#define to_nv_pmu_impl(coresight_pmu)	\
+	(container_of(coresight_pmu->impl.ops, struct nv_pmu_impl, ops))
+
+#define CORESIGHT_EVENT_ATTR_4_INNER(_pref, _num, _suff, _config)	\
+	CORESIGHT_EVENT_ATTR(_pref##_num##_suff, _config)
+
+#define CORESIGHT_EVENT_ATTR_4(_pref, _suff, _config)			\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _0_, _suff, _config),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _1_, _suff, _config + 1),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _2_, _suff, _config + 2),	\
+	CORESIGHT_EVENT_ATTR_4_INNER(_pref, _3_, _suff, _config + 3)
+
+struct nv_pmu_impl {
+	struct coresight_pmu_impl_ops ops;
+	const char *name;
+	u32 filter_mask;
+	struct attribute **event_attr;
+	struct attribute **format_attr;
+};
+
+static struct attribute *scf_pmu_event_attrs[] = {
+	CORESIGHT_EVENT_ATTR(bus_cycles,			0x1d),
+
+	CORESIGHT_EVENT_ATTR(scf_cache_allocate,		0xF0),
+	CORESIGHT_EVENT_ATTR(scf_cache_refill,			0xF1),
+	CORESIGHT_EVENT_ATTR(scf_cache,				0xF2),
+	CORESIGHT_EVENT_ATTR(scf_cache_wb,			0xF3),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_data,			0x101),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_rsp,			0x105),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_data,			0x109),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_rsp,			0x10d),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_data,		0x111),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_outstanding,		0x115),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_outstanding,		0x119),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_outstanding,		0x11d),
+	CORESIGHT_EVENT_ATTR_4(socket, wr_outstanding,		0x121),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_outstanding,		0x125),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_outstanding,		0x129),
+
+	CORESIGHT_EVENT_ATTR_4(socket, rd_access,		0x12d),
+	CORESIGHT_EVENT_ATTR_4(socket, dl_access,		0x131),
+	CORESIGHT_EVENT_ATTR_4(socket, wb_access,		0x135),
+	CORESIGHT_EVENT_ATTR_4(socket, wr_access,		0x139),
+	CORESIGHT_EVENT_ATTR_4(socket, ev_access,		0x13d),
+	CORESIGHT_EVENT_ATTR_4(socket, prb_access,		0x141),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_data,		0x145),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_access,		0x149),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_access,		0x14d),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_rd_outstanding,	0x151),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_outstanding,	0x155),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_data,		0x159),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_access,		0x15d),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_access,		0x161),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_rd_outstanding,		0x165),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_outstanding,		0x169),
+
+	CORESIGHT_EVENT_ATTR(gmem_rd_data,			0x16d),
+	CORESIGHT_EVENT_ATTR(gmem_rd_access,			0x16e),
+	CORESIGHT_EVENT_ATTR(gmem_rd_outstanding,		0x16f),
+	CORESIGHT_EVENT_ATTR(gmem_dl_rsp,			0x170),
+	CORESIGHT_EVENT_ATTR(gmem_dl_access,			0x171),
+	CORESIGHT_EVENT_ATTR(gmem_dl_outstanding,		0x172),
+	CORESIGHT_EVENT_ATTR(gmem_wb_data,			0x173),
+	CORESIGHT_EVENT_ATTR(gmem_wb_access,			0x174),
+	CORESIGHT_EVENT_ATTR(gmem_wb_outstanding,		0x175),
+	CORESIGHT_EVENT_ATTR(gmem_ev_rsp,			0x176),
+	CORESIGHT_EVENT_ATTR(gmem_ev_access,			0x177),
+	CORESIGHT_EVENT_ATTR(gmem_ev_outstanding,		0x178),
+	CORESIGHT_EVENT_ATTR(gmem_wr_data,			0x179),
+	CORESIGHT_EVENT_ATTR(gmem_wr_outstanding,		0x17a),
+	CORESIGHT_EVENT_ATTR(gmem_wr_access,			0x17b),
+
+	CORESIGHT_EVENT_ATTR_4(socket, wr_data,			0x17c),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_data,		0x180),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_data,		0x184),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wr_access,		0x188),
+	CORESIGHT_EVENT_ATTR_4(ocu, gmem_wb_outstanding,	0x18c),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_data,		0x190),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_data,		0x194),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wr_access,		0x198),
+	CORESIGHT_EVENT_ATTR_4(ocu, rem_wb_outstanding,		0x19c),
+
+	CORESIGHT_EVENT_ATTR(gmem_wr_total_bytes,		0x1a0),
+	CORESIGHT_EVENT_ATTR(remote_socket_wr_total_bytes,	0x1a1),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_data,		0x1a2),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_outstanding,	0x1a3),
+	CORESIGHT_EVENT_ATTR(remote_socket_rd_access,		0x1a4),
+
+	CORESIGHT_EVENT_ATTR(cmem_rd_data,			0x1a5),
+	CORESIGHT_EVENT_ATTR(cmem_rd_access,			0x1a6),
+	CORESIGHT_EVENT_ATTR(cmem_rd_outstanding,		0x1a7),
+	CORESIGHT_EVENT_ATTR(cmem_dl_rsp,			0x1a8),
+	CORESIGHT_EVENT_ATTR(cmem_dl_access,			0x1a9),
+	CORESIGHT_EVENT_ATTR(cmem_dl_outstanding,		0x1aa),
+	CORESIGHT_EVENT_ATTR(cmem_wb_data,			0x1ab),
+	CORESIGHT_EVENT_ATTR(cmem_wb_access,			0x1ac),
+	CORESIGHT_EVENT_ATTR(cmem_wb_outstanding,		0x1ad),
+	CORESIGHT_EVENT_ATTR(cmem_ev_rsp,			0x1ae),
+	CORESIGHT_EVENT_ATTR(cmem_ev_access,			0x1af),
+	CORESIGHT_EVENT_ATTR(cmem_ev_outstanding,		0x1b0),
+	CORESIGHT_EVENT_ATTR(cmem_wr_data,			0x1b1),
+	CORESIGHT_EVENT_ATTR(cmem_wr_outstanding,		0x1b2),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_data,		0x1b3),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_access,		0x1b7),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_access,		0x1bb),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_rd_outstanding,	0x1bf),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_outstanding,	0x1c3),
+
+	CORESIGHT_EVENT_ATTR(ocu_prb_access,			0x1c7),
+	CORESIGHT_EVENT_ATTR(ocu_prb_data,			0x1c8),
+	CORESIGHT_EVENT_ATTR(ocu_prb_outstanding,		0x1c9),
+
+	CORESIGHT_EVENT_ATTR(cmem_wr_access,			0x1ca),
+
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_access,		0x1cb),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_data,		0x1cf),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wr_data,		0x1d3),
+	CORESIGHT_EVENT_ATTR_4(ocu, cmem_wb_outstanding,	0x1d7),
+
+	CORESIGHT_EVENT_ATTR(cmem_wr_total_bytes,		0x1db),
+
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute *mcf_pmu_event_attrs[] = {
+	CORESIGHT_EVENT_ATTR(rd_bytes_loc,			0x0),
+	CORESIGHT_EVENT_ATTR(rd_bytes_rem,			0x1),
+	CORESIGHT_EVENT_ATTR(wr_bytes_loc,			0x2),
+	CORESIGHT_EVENT_ATTR(wr_bytes_rem,			0x3),
+	CORESIGHT_EVENT_ATTR(total_bytes_loc,			0x4),
+	CORESIGHT_EVENT_ATTR(total_bytes_rem,			0x5),
+	CORESIGHT_EVENT_ATTR(rd_req_loc,			0x6),
+	CORESIGHT_EVENT_ATTR(rd_req_rem,			0x7),
+	CORESIGHT_EVENT_ATTR(wr_req_loc,			0x8),
+	CORESIGHT_EVENT_ATTR(wr_req_rem,			0x9),
+	CORESIGHT_EVENT_ATTR(total_req_loc,			0xa),
+	CORESIGHT_EVENT_ATTR(total_req_rem,			0xb),
+	CORESIGHT_EVENT_ATTR(rd_cum_outs_loc,			0xc),
+	CORESIGHT_EVENT_ATTR(rd_cum_outs_rem,			0xd),
+	CORESIGHT_EVENT_ATTR(cycles, CORESIGHT_PMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute *scf_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	NULL,
+};
+
+static struct attribute *mcf_pcie_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	CORESIGHT_FORMAT_ATTR(root_port, "config1:0-9"),
+	NULL,
+};
+
+static struct attribute *mcf_gpu_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	CORESIGHT_FORMAT_ATTR(gpu, "config1:0-1"),
+	NULL,
+};
+
+static struct attribute *mcf_nvlink_pmu_format_attrs[] = {
+	CORESIGHT_FORMAT_EVENT_ATTR,
+	CORESIGHT_FORMAT_ATTR(socket, "config1:0-3"),
+	NULL,
+};
+
+static struct attribute **
+nv_coresight_pmu_get_event_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->event_attr;
+}
+
+static struct attribute **
+nv_coresight_pmu_get_format_attrs(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->format_attr;
+}
+
+static const char *
+nv_coresight_pmu_get_name(const struct coresight_pmu *coresight_pmu)
+{
+	const struct nv_pmu_impl *impl = to_nv_pmu_impl(coresight_pmu);
+
+	return impl->name;
+}
+
+static u32 nv_coresight_pmu_event_filter(const struct perf_event *event)
+{
+	const struct nv_pmu_impl *impl =
+		to_nv_pmu_impl(to_coresight_pmu(event->pmu));
+	return event->attr.config1 & impl->filter_mask;
+}
+
+int nv_coresight_init_ops(struct coresight_pmu *coresight_pmu)
+{
+	u32 product_id;
+	struct device *dev;
+	struct nv_pmu_impl *impl;
+	static atomic_t pmu_idx = {0};
+
+	dev = coresight_pmu->dev;
+
+	impl = devm_kzalloc(dev, sizeof(struct nv_pmu_impl), GFP_KERNEL);
+	if (!impl)
+		return -ENOMEM;
+
+	product_id = (coresight_pmu->impl.pmiidr >> PMIIDR_PRODUCTID_SHIFT) &
+		     PMIIDR_PRODUCTID_MASK;
+
+	switch (product_id) {
+	case 0x103:
+		impl->name =
+			devm_kasprintf(dev, GFP_KERNEL,
+				       "nvidia_mcf_pcie_pmu_%u",
+				       coresight_pmu->apmt_node->proc_affinity);
+		impl->filter_mask	= NV_MCF_PCIE_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_pcie_pmu_format_attrs;
+		break;
+	case 0x104:
+		impl->name =
+			devm_kasprintf(dev, GFP_KERNEL,
+				       "nvidia_mcf_gpuvir_pmu_%u",
+				       coresight_pmu->apmt_node->proc_affinity);
+		impl->filter_mask	= NV_MCF_GPU_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_gpu_pmu_format_attrs;
+		break;
+	case 0x105:
+		impl->name =
+			devm_kasprintf(dev, GFP_KERNEL,
+				       "nvidia_mcf_gpu_pmu_%u",
+				       coresight_pmu->apmt_node->proc_affinity);
+		impl->filter_mask	= NV_MCF_GPU_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_gpu_pmu_format_attrs;
+		break;
+	case 0x106:
+		impl->name =
+			devm_kasprintf(dev, GFP_KERNEL,
+				       "nvidia_mcf_nvlink_pmu_%u",
+				       coresight_pmu->apmt_node->proc_affinity);
+		impl->filter_mask	= NV_MCF_NVLINK_FILTER_ID_MASK;
+		impl->event_attr	= mcf_pmu_event_attrs;
+		impl->format_attr	= mcf_nvlink_pmu_format_attrs;
+		break;
+	case 0x2CF:
+		impl->name =
+			devm_kasprintf(dev, GFP_KERNEL, "nvidia_scf_pmu_%u",
+				       coresight_pmu->apmt_node->proc_affinity);
+		impl->filter_mask	= 0x0;
+		impl->event_attr	= scf_pmu_event_attrs;
+		impl->format_attr	= scf_pmu_format_attrs;
+		break;
+	default:
+		impl->name =
+			devm_kasprintf(dev, GFP_KERNEL, "nvidia_uncore_pmu_%u",
+				       atomic_fetch_inc(&pmu_idx));
+		impl->filter_mask = CORESIGHT_FILTER_MASK;
+		impl->event_attr  = coresight_pmu_get_event_attrs(coresight_pmu);
+		impl->format_attr =
+			coresight_pmu_get_format_attrs(coresight_pmu);
+		break;
+	}
+
+	impl->ops.get_event_attrs	= nv_coresight_pmu_get_event_attrs;
+	impl->ops.get_format_attrs	= nv_coresight_pmu_get_format_attrs;
+	impl->ops.get_identifier	= coresight_pmu_get_identifier;
+	impl->ops.get_name		= nv_coresight_pmu_get_name;
+	impl->ops.event_filter		= nv_coresight_pmu_event_filter;
+	impl->ops.event_type		= coresight_pmu_event_type;
+	impl->ops.event_attr_is_visible	= coresight_pmu_event_attr_is_visible;
+	impl->ops.is_cc_event		= coresight_pmu_is_cc_event;
+
+	coresight_pmu->impl.ops = &impl->ops;
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nv_coresight_init_ops);
diff --git a/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h
new file mode 100644
index 000000000000..3c81c16c14f4
--- /dev/null
+++ b/drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+/* Support for NVIDIA specific attributes. */
+
+#ifndef __ARM_CORESIGHT_PMU_NVIDIA_H__
+#define __ARM_CORESIGHT_PMU_NVIDIA_H__
+
+#include "arm_coresight_pmu.h"
+
+/* Allocate NVIDIA descriptor. */
+int nv_coresight_init_ops(struct coresight_pmu *coresight_pmu);
+
+#endif /* __ARM_CORESIGHT_PMU_NVIDIA_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* RE: [PATCH v3 0/2] perf: ARM CoreSight PMU support
  2022-05-25  6:48   ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2022-05-25  6:48     ` [PATCH v3 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
  2022-05-25  6:48     ` [PATCH v3 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
@ 2022-06-03 15:47     ` Besar Wicaksono
  2022-06-14 10:19     ` Besar Wicaksono
  3 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-06-03 15:47 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi

Hello,

I was wondering if there are any other comments that need to be addressed in this patch ?

Regards,
Besar

> -----Original Message-----
> From: Besar Wicaksono <bwicaksono@nvidia.com>
> Sent: Wednesday, May 25, 2022 1:49 AM
> To: suzuki.poulose@arm.com; robin.murphy@arm.com;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>; Besar Wicaksono <bwicaksono@nvidia.com>
> Subject: [PATCH v3 0/2] perf: ARM CoreSight PMU support
> 
> Add driver support for ARM CoreSight PMU device and event attributes for
> NVIDIA
> implementation. The code is based on ARM Coresight PMU architecture and
> ACPI ARM
> Performance Monitoring Unit table (APMT) specification below:
>  * ARM Coresight PMU:
>         https://developer.arm.com/documentation/ihi0091/latest
>  * APMT: https://developer.arm.com/documentation/den0117/latest
> 
> Notes:
>  * There is a concern on the naming of the PMU device.
>    Currently the driver is probing "arm-coresight-pmu" device, however the
> APMT
>    spec supports different kinds of CoreSight PMU based implementation. So
> it is
>    open for discussion if the name can stay or a "generic" name is required.
>    Please see the following thread:
>    http://lists.infradead.org/pipermail/linux-arm-kernel/2022-
> May/740485.html
> 
> The patchset applies on top of
>   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   master next-20220524
> 
> Changes from v2:
>  * Driver is now probing "arm-system-pmu" device.
>  * Change default PMU naming to "arm_<APMT node type>_pmu".
>  * Add implementor ops to generate custom name.
> Thanks to suzuki.poulose@arm.com for the review comments.
> 
> Changes from v1:
>  * Remove CPU arch dependency.
>  * Remove 32-bit read/write helper function and just use read/writel.
>  * Add .is_visible into event attribute to filter out cycle counter event.
>  * Update pmiidr matching.
>  * Remove read-modify-write on PMCR since the driver only writes to
> PMCR.E.
>  * Assign default cycle event outside the 32-bit PMEVTYPER range.
>  * Rework the active event and used counter tracking.
> Thanks to robin.murphy@arm.com for the review comments.
> 
> Besar Wicaksono (2):
>   perf: coresight_pmu: Add support for ARM CoreSight PMU driver
>   perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
> 
>  arch/arm64/configs/defconfig                  |    1 +
>  drivers/perf/Kconfig                          |    2 +
>  drivers/perf/Makefile                         |    1 +
>  drivers/perf/coresight_pmu/Kconfig            |   11 +
>  drivers/perf/coresight_pmu/Makefile           |    7 +
>  .../perf/coresight_pmu/arm_coresight_pmu.c    | 1316
> +++++++++++++++++
>  .../perf/coresight_pmu/arm_coresight_pmu.h    |  177 +++
>  .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  312 ++++
>  .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
>  9 files changed, 1844 insertions(+)
>  create mode 100644 drivers/perf/coresight_pmu/Kconfig
>  create mode 100644 drivers/perf/coresight_pmu/Makefile
>  create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
>  create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
>  create mode 100644
> drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
>  create mode 100644
> drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h
> 
> 
> base-commit: 09ce5091ff971cdbfd67ad84dc561ea27f10d67a
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

* RE: [PATCH v3 0/2] perf: ARM CoreSight PMU support
  2022-05-25  6:48   ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
                       ` (2 preceding siblings ...)
  2022-06-03 15:47     ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
@ 2022-06-14 10:19     ` Besar Wicaksono
  3 siblings, 0 replies; 31+ messages in thread
From: Besar Wicaksono @ 2022-06-14 10:19 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi

Gentle ping. Any comment or suggestion is appreciated.

Thank you,
Besar

> -----Original Message-----
> From: Besar Wicaksono <bwicaksono@nvidia.com>
> Sent: Wednesday, May 25, 2022 1:49 PM
> To: suzuki.poulose@arm.com; robin.murphy@arm.com;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>; Besar Wicaksono <bwicaksono@nvidia.com>
> Subject: [PATCH v3 0/2] perf: ARM CoreSight PMU support
> 
> Add driver support for ARM CoreSight PMU device and event attributes for
> NVIDIA
> implementation. The code is based on ARM Coresight PMU architecture and
> ACPI ARM
> Performance Monitoring Unit table (APMT) specification below:
>  * ARM Coresight PMU:
>         https://developer.arm.com/documentation/ihi0091/latest
>  * APMT: https://developer.arm.com/documentation/den0117/latest
> 
> Notes:
>  * There is a concern on the naming of the PMU device.
>    Currently the driver is probing "arm-coresight-pmu" device, however the
> APMT
>    spec supports different kinds of CoreSight PMU based implementation. So
> it is
>    open for discussion if the name can stay or a "generic" name is required.
>    Please see the following thread:
>    http://lists.infradead.org/pipermail/linux-arm-kernel/2022-
> May/740485.html
> 
> The patchset applies on top of
>   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   master next-20220524
> 
> Changes from v2:
>  * Driver is now probing "arm-system-pmu" device.
>  * Change default PMU naming to "arm_<APMT node type>_pmu".
>  * Add implementor ops to generate custom name.
> Thanks to suzuki.poulose@arm.com for the review comments.
> 
> Changes from v1:
>  * Remove CPU arch dependency.
>  * Remove 32-bit read/write helper function and just use read/writel.
>  * Add .is_visible into event attribute to filter out cycle counter event.
>  * Update pmiidr matching.
>  * Remove read-modify-write on PMCR since the driver only writes to
> PMCR.E.
>  * Assign default cycle event outside the 32-bit PMEVTYPER range.
>  * Rework the active event and used counter tracking.
> Thanks to robin.murphy@arm.com for the review comments.
> 
> Besar Wicaksono (2):
>   perf: coresight_pmu: Add support for ARM CoreSight PMU driver
>   perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute
> 
>  arch/arm64/configs/defconfig                  |    1 +
>  drivers/perf/Kconfig                          |    2 +
>  drivers/perf/Makefile                         |    1 +
>  drivers/perf/coresight_pmu/Kconfig            |   11 +
>  drivers/perf/coresight_pmu/Makefile           |    7 +
>  .../perf/coresight_pmu/arm_coresight_pmu.c    | 1316
> +++++++++++++++++
>  .../perf/coresight_pmu/arm_coresight_pmu.h    |  177 +++
>  .../coresight_pmu/arm_coresight_pmu_nvidia.c  |  312 ++++
>  .../coresight_pmu/arm_coresight_pmu_nvidia.h  |   17 +
>  9 files changed, 1844 insertions(+)
>  create mode 100644 drivers/perf/coresight_pmu/Kconfig
>  create mode 100644 drivers/perf/coresight_pmu/Makefile
>  create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.c
>  create mode 100644 drivers/perf/coresight_pmu/arm_coresight_pmu.h
>  create mode 100644
> drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.c
>  create mode 100644
> drivers/perf/coresight_pmu/arm_coresight_pmu_nvidia.h
> 
> 
> base-commit: 09ce5091ff971cdbfd67ad84dc561ea27f10d67a
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2022-06-14 10:19 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-05-09  0:28 [PATCH 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
2022-05-09  0:28 ` [PATCH 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
2022-05-09 12:13   ` Robin Murphy
2022-05-11  2:46     ` Besar Wicaksono
2022-05-11 10:03       ` Robin Murphy
2022-05-09  0:28 ` [PATCH 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
2022-05-09  9:28 ` [PATCH 0/2] perf: ARM CoreSight PMU support Will Deacon
2022-05-09 10:02   ` Suzuki K Poulose
2022-05-09 12:20     ` Shaokun Zhang
2022-05-09 22:07     ` Besar Wicaksono
2022-05-10 11:07     ` Sudeep Holla
2022-05-10 11:13       ` Will Deacon
2022-05-10 18:40         ` Sudeep Holla
2022-05-11  1:29           ` Besar Wicaksono
2022-05-11 12:42             ` Robin Murphy
2022-05-13  6:16               ` Thanu Rangarajan
2022-05-11  8:44         ` Suzuki K Poulose
2022-05-11 16:44           ` Besar Wicaksono
2022-05-13 12:25             ` Besar Wicaksono
2022-05-15 16:30 ` [PATCH v2 " Besar Wicaksono
2022-05-15 16:30   ` [PATCH v2 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
2022-05-18  7:16     ` kernel test robot
2022-05-18 20:10       ` Besar Wicaksono
2022-05-19  8:52     ` Suzuki K Poulose
2022-05-19 17:04       ` Besar Wicaksono
2022-05-15 16:30   ` [PATCH v2 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
2022-05-25  6:48   ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
2022-05-25  6:48     ` [PATCH v3 1/2] perf: coresight_pmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
2022-05-25  6:48     ` [PATCH v3 2/2] perf: coresight_pmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
2022-06-03 15:47     ` [PATCH v3 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
2022-06-14 10:19     ` Besar Wicaksono

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).