linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/2] perf: ARM CoreSight PMU support
@ 2022-08-14 18:23 Besar Wicaksono
  2022-08-14 18:23 ` [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Besar Wicaksono @ 2022-08-14 18:23 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	mathieu.poirier, mike.leach, leo.yan, Besar Wicaksono

Add driver support for ARM CoreSight PMU device and event attributes for NVIDIA
implementation. The code is based on ARM Coresight PMU architecture and ACPI ARM
Performance Monitoring Unit table (APMT) specification below:
 * ARM Coresight PMU:
        https://developer.arm.com/documentation/ihi0091/latest
 * APMT: https://developer.arm.com/documentation/den0117/latest

The patchset applies on top of
  https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
  master next-20220524

For APMT support, please see patchset: https://lkml.org/lkml/2022/4/19/1395 

Changes from v3:
 * Driver is now probing "arm-cs-arch-pmu" device.
 * The driver files, directory, functions are renamed with "arm_cspmu" prefix.
 * Use Kconfig ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU.
 * Add kernel doc for NVIDIA Uncore PMU.
 * Use GENMASK and FIELD_GET macros everywhere.
Thanks to suzuki.poulose@arm.com and will@kernel.org for the review comments.
v3: https://lore.kernel.org/linux-arm-kernel/20220621055035.31766-1-bwicaksono@nvidia.com/

Changes from v2:
 * Driver is now probing "arm-system-pmu" device.
 * Change default PMU naming to "arm_<APMT node type>_pmu".
 * Add implementor ops to generate custom name.
Thanks to suzuki.poulose@arm.com for the review comments.
v2: https://lore.kernel.org/linux-arm-kernel/20220515163044.50055-1-bwicaksono@nvidia.com/

Changes from v1:
 * Remove CPU arch dependency.
 * Remove 32-bit read/write helper function and just use read/writel.
 * Add .is_visible into event attribute to filter out cycle counter event.
 * Update pmiidr matching.
 * Remove read-modify-write on PMCR since the driver only writes to PMCR.E.
 * Assign default cycle event outside the 32-bit PMEVTYPER range.
 * Rework the active event and used counter tracking.
Thanks to robin.murphy@arm.com for the review comments.
v1: https://lore.kernel.org/linux-arm-kernel/20220509002810.12412-1-bwicaksono@nvidia.com/

Besar Wicaksono (2):
  perf: arm_cspmu: Add support for ARM CoreSight PMU driver
  perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute

 Documentation/admin-guide/perf/index.rst      |    1 +
 Documentation/admin-guide/perf/nvidia-pmu.rst |  120 ++
 arch/arm64/configs/defconfig                  |    1 +
 drivers/perf/Kconfig                          |    2 +
 drivers/perf/Makefile                         |    1 +
 drivers/perf/arm_cspmu/Kconfig                |   13 +
 drivers/perf/arm_cspmu/Makefile               |    7 +
 drivers/perf/arm_cspmu/arm_cspmu.c            | 1269 +++++++++++++++++
 drivers/perf/arm_cspmu/arm_cspmu.h            |  151 ++
 drivers/perf/arm_cspmu/nvidia_cspmu.c         |  367 +++++
 drivers/perf/arm_cspmu/nvidia_cspmu.h         |   17 +
 11 files changed, 1949 insertions(+)
 create mode 100644 Documentation/admin-guide/perf/nvidia-pmu.rst
 create mode 100644 drivers/perf/arm_cspmu/Kconfig
 create mode 100644 drivers/perf/arm_cspmu/Makefile
 create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.c
 create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.h
 create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.c
 create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.h


base-commit: 09ce5091ff971cdbfd67ad84dc561ea27f10d67a
-- 
2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver
  2022-08-14 18:23 [PATCH v4 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
@ 2022-08-14 18:23 ` Besar Wicaksono
  2022-09-22 13:52   ` Will Deacon
  2022-09-27 11:39   ` Suzuki K Poulose
  2022-08-14 18:23 ` [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
  2022-08-23 17:24 ` [PATCH v4 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2 siblings, 2 replies; 13+ messages in thread
From: Besar Wicaksono @ 2022-08-14 18:23 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	mathieu.poirier, mike.leach, leo.yan, Besar Wicaksono

Add support for ARM CoreSight PMU driver framework and interfaces.
The driver provides generic implementation to operate uncore PMU based
on ARM CoreSight PMU architecture. The driver also provides interface
to get vendor/implementation specific information, for example event
attributes and formating.

The specification used in this implementation can be found below:
 * ACPI Arm Performance Monitoring Unit table:
        https://developer.arm.com/documentation/den0117/latest
 * ARM Coresight PMU architecture:
        https://developer.arm.com/documentation/ihi0091/latest

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 arch/arm64/configs/defconfig       |    1 +
 drivers/perf/Kconfig               |    2 +
 drivers/perf/Makefile              |    1 +
 drivers/perf/arm_cspmu/Kconfig     |   13 +
 drivers/perf/arm_cspmu/Makefile    |    6 +
 drivers/perf/arm_cspmu/arm_cspmu.c | 1262 ++++++++++++++++++++++++++++
 drivers/perf/arm_cspmu/arm_cspmu.h |  151 ++++
 7 files changed, 1436 insertions(+)
 create mode 100644 drivers/perf/arm_cspmu/Kconfig
 create mode 100644 drivers/perf/arm_cspmu/Makefile
 create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.c
 create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.h

diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
index 7d1105343bc2..ee31c9159a5b 100644
--- a/arch/arm64/configs/defconfig
+++ b/arch/arm64/configs/defconfig
@@ -1212,6 +1212,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
 CONFIG_PHY_TEGRA_XUSB=y
 CONFIG_PHY_AM654_SERDES=m
 CONFIG_PHY_J721E_WIZ=m
+CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU=y
 CONFIG_ARM_SMMU_V3_PMU=m
 CONFIG_FSL_IMX8_DDR_PMU=m
 CONFIG_QCOM_L2_PMU=y
diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 1e2d69453771..c94d3601eb48 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
 	  Enable perf support for Marvell DDR Performance monitoring
 	  event on CN10K platform.
 
+source "drivers/perf/arm_cspmu/Kconfig"
+
 endmenu
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index 57a279c61df5..3bc9323f0965 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
 obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
+obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
diff --git a/drivers/perf/arm_cspmu/Kconfig b/drivers/perf/arm_cspmu/Kconfig
new file mode 100644
index 000000000000..c2c56ecafccb
--- /dev/null
+++ b/drivers/perf/arm_cspmu/Kconfig
@@ -0,0 +1,13 @@
+# SPDX-License-Identifier: GPL-2.0
+#
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+
+config ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
+	tristate "ARM Coresight Architecture PMU"
+	depends on ACPI
+	depends on ACPI_APMT || COMPILE_TEST
+	help
+	  Provides support for performance monitoring unit (PMU) devices
+	  based on ARM CoreSight PMU architecture. Note that this PMU
+	  architecture does not have relationship with the ARM CoreSight
+	  Self-Hosted Tracing.
diff --git a/drivers/perf/arm_cspmu/Makefile b/drivers/perf/arm_cspmu/Makefile
new file mode 100644
index 000000000000..cdc3455f74d8
--- /dev/null
+++ b/drivers/perf/arm_cspmu/Makefile
@@ -0,0 +1,6 @@
+# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+#
+# SPDX-License-Identifier: GPL-2.0
+
+obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += \
+	arm_cspmu.o
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
new file mode 100644
index 000000000000..410876f86eb0
--- /dev/null
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -0,0 +1,1262 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * ARM CoreSight Architecture PMU driver.
+ *
+ * This driver adds support for uncore PMU based on ARM CoreSight Performance
+ * Monitoring Unit Architecture. The PMU is accessible via MMIO registers and
+ * like other uncore PMUs, it does not support process specific events and
+ * cannot be used in sampling mode.
+ *
+ * This code is based on other uncore PMUs like ARM DSU PMU. It provides a
+ * generic implementation to operate the PMU according to CoreSight PMU
+ * architecture and ACPI ARM PMU table (APMT) documents below:
+ *   - ARM CoreSight PMU architecture document number: ARM IHI 0091 A.a-00bet0.
+ *   - APMT document number: ARM DEN0117.
+ *
+ * The user should refer to the vendor technical documentation to get details
+ * about the supported events.
+ *
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+#include <linux/acpi.h>
+#include <linux/cacheinfo.h>
+#include <linux/ctype.h>
+#include <linux/interrupt.h>
+#include <linux/io-64-nonatomic-lo-hi.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <acpi/processor.h>
+
+#include "arm_cspmu.h"
+
+#define PMUNAME "arm_cspmu"
+#define DRVNAME "arm-cs-arch-pmu"
+
+#define ARM_CSPMU_CPUMASK_ATTR(_name, _config)			\
+	ARM_CSPMU_EXT_ATTR(_name, arm_cspmu_cpumask_show,	\
+				(unsigned long)_config)
+
+/*
+ * CoreSight PMU Arch register offsets.
+ */
+#define PMEVCNTR_LO					0x0
+#define PMEVCNTR_HI					0x4
+#define PMEVTYPER					0x400
+#define PMCCFILTR					0x47C
+#define PMEVFILTR					0xA00
+#define PMCNTENSET					0xC00
+#define PMCNTENCLR					0xC20
+#define PMINTENSET					0xC40
+#define PMINTENCLR					0xC60
+#define PMOVSCLR					0xC80
+#define PMOVSSET					0xCC0
+#define PMCFGR						0xE00
+#define PMCR						0xE04
+#define PMIIDR						0xE08
+
+/* PMCFGR register field */
+#define PMCFGR_NCG					GENMASK(31, 28)
+#define PMCFGR_HDBG					BIT(24)
+#define PMCFGR_TRO					BIT(23)
+#define PMCFGR_SS					BIT(22)
+#define PMCFGR_FZO					BIT(21)
+#define PMCFGR_MSI					BIT(20)
+#define PMCFGR_UEN					BIT(19)
+#define PMCFGR_NA					BIT(17)
+#define PMCFGR_EX					BIT(16)
+#define PMCFGR_CCD					BIT(15)
+#define PMCFGR_CC					BIT(14)
+#define PMCFGR_SIZE					GENMASK(13, 8)
+#define PMCFGR_N					GENMASK(7, 0)
+
+/* PMCR register field */
+#define PMCR_TRO					BIT(11)
+#define PMCR_HDBG					BIT(10)
+#define PMCR_FZO					BIT(9)
+#define PMCR_NA						BIT(8)
+#define PMCR_DP						BIT(5)
+#define PMCR_X						BIT(4)
+#define PMCR_D						BIT(3)
+#define PMCR_C						BIT(2)
+#define PMCR_P						BIT(1)
+#define PMCR_E						BIT(0)
+
+/* Each SET/CLR register supports up to 32 counters. */
+#define ARM_CSPMU_SET_CLR_COUNTER_SHIFT		5
+#define ARM_CSPMU_SET_CLR_COUNTER_NUM		\
+	(1 << ARM_CSPMU_SET_CLR_COUNTER_SHIFT)
+
+/* The number of 32-bit SET/CLR register that can be supported. */
+#define ARM_CSPMU_SET_CLR_MAX_NUM ((PMCNTENCLR - PMCNTENSET) / sizeof(u32))
+
+static_assert(
+	(ARM_CSPMU_SET_CLR_MAX_NUM * ARM_CSPMU_SET_CLR_COUNTER_NUM) >=
+	ARM_CSPMU_MAX_HW_CNTRS);
+
+/* Convert counter idx into SET/CLR register number. */
+#define COUNTER_TO_SET_CLR_ID(idx)			\
+	(idx >> ARM_CSPMU_SET_CLR_COUNTER_SHIFT)
+
+/* Convert counter idx into SET/CLR register bit. */
+#define COUNTER_TO_SET_CLR_BIT(idx)			\
+	(idx & (ARM_CSPMU_SET_CLR_COUNTER_NUM - 1))
+
+#define ARM_CSPMU_ACTIVE_CPU_MASK		0x0
+#define ARM_CSPMU_ASSOCIATED_CPU_MASK		0x1
+
+/* Check if field f in flags is set with value v */
+#define CHECK_APMT_FLAG(flags, f, v) \
+	((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
+
+/* Check and use default if implementer doesn't provide attribute callback */
+#define CHECK_DEFAULT_IMPL_OPS(ops, callback)			\
+	do {							\
+		if (!ops->callback)				\
+			ops->callback = arm_cspmu_ ## callback;	\
+	} while (0)
+
+static unsigned long arm_cspmu_cpuhp_state;
+
+/*
+ * In CoreSight PMU architecture, all of the MMIO registers are 32-bit except
+ * counter register. The counter register can be implemented as 32-bit or 64-bit
+ * register depending on the value of PMCFGR.SIZE field. For 64-bit access,
+ * single-copy 64-bit atomic support is implementation defined. APMT node flag
+ * is used to identify if the PMU supports 64-bit single copy atomic. If 64-bit
+ * single copy atomic is not supported, the driver treats the register as a pair
+ * of 32-bit register.
+ */
+
+/*
+ * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
+ */
+static u64 read_reg64_hilohi(const void __iomem *addr)
+{
+	u32 val_lo, val_hi;
+	u64 val;
+
+	/* Use high-low-high sequence to avoid tearing */
+	do {
+		val_hi = readl(addr + 4);
+		val_lo = readl(addr);
+	} while (val_hi != readl(addr + 4));
+
+	val = (((u64)val_hi << 32) | val_lo);
+
+	return val;
+}
+
+/* Check if PMU supports 64-bit single copy atomic. */
+static inline bool supports_64bit_atomics(const struct arm_cspmu *cspmu)
+{
+	return CHECK_APMT_FLAG(cspmu->apmt_node->flags, ATOMIC, SUPP);
+}
+
+/* Check if cycle counter is supported. */
+static inline bool supports_cycle_counter(const struct arm_cspmu *cspmu)
+{
+	return (cspmu->pmcfgr & PMCFGR_CC);
+}
+
+/* Get counter size, which is (PMCFGR_SIZE + 1). */
+static inline u32 counter_size(const struct arm_cspmu *cspmu)
+{
+	return FIELD_GET(PMCFGR_SIZE, cspmu->pmcfgr) + 1;
+}
+
+/* Get counter mask. */
+static inline u64 counter_mask(const struct arm_cspmu *cspmu)
+{
+	return GENMASK_ULL(counter_size(cspmu) - 1, 0);
+}
+
+/* Check if counter is implemented as 64-bit register. */
+static inline bool use_64b_counter_reg(const struct arm_cspmu *cspmu)
+{
+	return (counter_size(cspmu) > 32);
+}
+
+ssize_t arm_cspmu_sysfs_event_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "event=0x%llx\n",
+			  (unsigned long long)eattr->var);
+}
+EXPORT_SYMBOL_GPL(arm_cspmu_sysfs_event_show);
+
+/* Default event list. */
+static struct attribute *arm_cspmu_event_attrs[] = {
+	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute **
+arm_cspmu_get_event_attrs(const struct arm_cspmu *cspmu)
+{
+	return arm_cspmu_event_attrs;
+}
+
+static umode_t
+arm_cspmu_event_attr_is_visible(struct kobject *kobj,
+				struct attribute *attr, int unused)
+{
+	struct device *dev = kobj_to_dev(kobj);
+	struct arm_cspmu *cspmu = to_arm_cspmu(dev_get_drvdata(dev));
+	struct perf_pmu_events_attr *eattr;
+
+	eattr = container_of(attr, typeof(*eattr), attr.attr);
+
+	/* Hide cycle event if not supported */
+	if (!supports_cycle_counter(cspmu) &&
+	    eattr->id == ARM_CSPMU_EVT_CYCLES_DEFAULT)
+		return 0;
+
+	return attr->mode;
+}
+
+ssize_t arm_cspmu_sysfs_format_show(struct device *dev,
+				struct device_attribute *attr,
+				char *buf)
+{
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	return sysfs_emit(buf, "%s\n", (char *)eattr->var);
+}
+EXPORT_SYMBOL_GPL(arm_cspmu_sysfs_format_show);
+
+static struct attribute *arm_cspmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_EVENT_ATTR,
+	ARM_CSPMU_FORMAT_FILTER_ATTR,
+	NULL,
+};
+
+static struct attribute **
+arm_cspmu_get_format_attrs(const struct arm_cspmu *cspmu)
+{
+	return arm_cspmu_format_attrs;
+}
+
+static u32 arm_cspmu_event_type(const struct perf_event *event)
+{
+	return event->attr.config & ARM_CSPMU_EVENT_MASK;
+}
+
+static bool arm_cspmu_is_cycle_counter_event(const struct perf_event *event)
+{
+	return (event->attr.config == ARM_CSPMU_EVT_CYCLES_DEFAULT);
+}
+
+static u32 arm_cspmu_event_filter(const struct perf_event *event)
+{
+	return event->attr.config1 & ARM_CSPMU_FILTER_MASK;
+}
+
+static ssize_t arm_cspmu_identifier_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *page)
+{
+	struct arm_cspmu *cspmu = to_arm_cspmu(dev_get_drvdata(dev));
+
+	return sysfs_emit(page, "%s\n", cspmu->identifier);
+}
+
+static struct device_attribute arm_cspmu_identifier_attr =
+	__ATTR(identifier, 0444, arm_cspmu_identifier_show, NULL);
+
+static struct attribute *arm_cspmu_identifier_attrs[] = {
+	&arm_cspmu_identifier_attr.attr,
+	NULL,
+};
+
+static struct attribute_group arm_cspmu_identifier_attr_group = {
+	.attrs = arm_cspmu_identifier_attrs,
+};
+
+static const char *arm_cspmu_get_identifier(const struct arm_cspmu *cspmu)
+{
+	const char *identifier =
+		devm_kasprintf(cspmu->dev, GFP_KERNEL, "%x",
+			       cspmu->impl.pmiidr);
+	return identifier;
+}
+
+static const char *arm_cspmu_type_str[ACPI_APMT_NODE_TYPE_COUNT] = {
+	"mc",
+	"smmu",
+	"pcie",
+	"acpi",
+	"cache",
+};
+
+static const char *arm_cspmu_get_name(const struct arm_cspmu *cspmu)
+{
+	struct device *dev;
+	struct acpi_apmt_node *apmt_node;
+	u8 pmu_type;
+	char *name;
+	char acpi_hid_string[ACPI_ID_LEN] = { 0 };
+	static atomic_t pmu_idx[ACPI_APMT_NODE_TYPE_COUNT] = { 0 };
+
+	dev = cspmu->dev;
+	apmt_node = cspmu->apmt_node;
+	pmu_type = apmt_node->type;
+
+	if (pmu_type >= ACPI_APMT_NODE_TYPE_COUNT) {
+		dev_err(dev, "unsupported PMU type-%u\n", pmu_type);
+		return NULL;
+	}
+
+	if (pmu_type == ACPI_APMT_NODE_TYPE_ACPI) {
+		memcpy(acpi_hid_string,
+			&apmt_node->inst_primary,
+			sizeof(apmt_node->inst_primary));
+		name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s_%s_%u", PMUNAME,
+				      arm_cspmu_type_str[pmu_type],
+				      acpi_hid_string,
+				      apmt_node->inst_secondary);
+	} else {
+		name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s_%d", PMUNAME,
+				      arm_cspmu_type_str[pmu_type],
+				      atomic_fetch_inc(&pmu_idx[pmu_type]));
+	}
+
+	return name;
+}
+
+static ssize_t arm_cspmu_cpumask_show(struct device *dev,
+				      struct device_attribute *attr,
+				      char *buf)
+{
+	struct pmu *pmu = dev_get_drvdata(dev);
+	struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
+	struct dev_ext_attribute *eattr =
+		container_of(attr, struct dev_ext_attribute, attr);
+	unsigned long mask_id = (unsigned long)eattr->var;
+	const cpumask_t *cpumask;
+
+	switch (mask_id) {
+	case ARM_CSPMU_ACTIVE_CPU_MASK:
+		cpumask = &cspmu->active_cpu;
+		break;
+	case ARM_CSPMU_ASSOCIATED_CPU_MASK:
+		cpumask = &cspmu->associated_cpus;
+		break;
+	default:
+		return 0;
+	}
+	return cpumap_print_to_pagebuf(true, buf, cpumask);
+}
+
+static struct attribute *arm_cspmu_cpumask_attrs[] = {
+	ARM_CSPMU_CPUMASK_ATTR(cpumask, ARM_CSPMU_ACTIVE_CPU_MASK),
+	ARM_CSPMU_CPUMASK_ATTR(associated_cpus, ARM_CSPMU_ASSOCIATED_CPU_MASK),
+	NULL,
+};
+
+static struct attribute_group arm_cspmu_cpumask_attr_group = {
+	.attrs = arm_cspmu_cpumask_attrs,
+};
+
+struct impl_match {
+	u32 pmiidr;
+	u32 mask;
+	int (*impl_init_ops)(struct arm_cspmu *cspmu);
+};
+
+static const struct impl_match impl_match[] = {
+	{}
+};
+
+static int arm_cspmu_init_impl_ops(struct arm_cspmu *cspmu)
+{
+	int ret;
+	struct acpi_apmt_node *apmt_node = cspmu->apmt_node;
+	struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
+	const struct impl_match *match = impl_match;
+
+	/*
+	 * Get PMU implementer and product id from APMT node.
+	 * If APMT node doesn't have implementer/product id, try get it
+	 * from PMIIDR.
+	 */
+	cspmu->impl.pmiidr =
+		(apmt_node->impl_id) ? apmt_node->impl_id :
+				       readl(cspmu->base0 + PMIIDR);
+
+	/* Find implementer specific attribute ops. */
+	for (; match->pmiidr; match++) {
+		const u32 mask = match->mask;
+
+		if ((match->pmiidr & mask) == (cspmu->impl.pmiidr & mask)) {
+			ret = match->impl_init_ops(cspmu);
+			if (ret)
+				return ret;
+
+			break;
+		}
+	}
+
+	/* Use default callbacks if implementer doesn't provide one. */
+	CHECK_DEFAULT_IMPL_OPS(impl_ops, get_event_attrs);
+	CHECK_DEFAULT_IMPL_OPS(impl_ops, get_format_attrs);
+	CHECK_DEFAULT_IMPL_OPS(impl_ops, get_identifier);
+	CHECK_DEFAULT_IMPL_OPS(impl_ops, get_name);
+	CHECK_DEFAULT_IMPL_OPS(impl_ops, is_cycle_counter_event);
+	CHECK_DEFAULT_IMPL_OPS(impl_ops, event_type);
+	CHECK_DEFAULT_IMPL_OPS(impl_ops, event_filter);
+	CHECK_DEFAULT_IMPL_OPS(impl_ops, event_attr_is_visible);
+
+	return 0;
+}
+
+static struct attribute_group *
+arm_cspmu_alloc_event_attr_group(struct arm_cspmu *cspmu)
+{
+	struct attribute_group *event_group;
+	struct device *dev = cspmu->dev;
+	const struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
+
+	event_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!event_group)
+		return NULL;
+
+	event_group->name = "events";
+	event_group->attrs = impl_ops->get_event_attrs(cspmu);
+	event_group->is_visible = impl_ops->event_attr_is_visible;
+
+	return event_group;
+}
+
+static struct attribute_group *
+arm_cspmu_alloc_format_attr_group(struct arm_cspmu *cspmu)
+{
+	struct attribute_group *format_group;
+	struct device *dev = cspmu->dev;
+
+	format_group =
+		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
+	if (!format_group)
+		return NULL;
+
+	format_group->name = "format";
+	format_group->attrs = cspmu->impl.ops.get_format_attrs(cspmu);
+
+	return format_group;
+}
+
+static struct attribute_group **
+arm_cspmu_alloc_attr_group(struct arm_cspmu *cspmu)
+{
+	struct attribute_group **attr_groups = NULL;
+	struct device *dev = cspmu->dev;
+	const struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
+	int ret;
+
+	ret = arm_cspmu_init_impl_ops(cspmu);
+	if (ret)
+		return NULL;
+
+	cspmu->identifier = impl_ops->get_identifier(cspmu);
+	cspmu->name = impl_ops->get_name(cspmu);
+
+	if (!cspmu->identifier || !cspmu->name)
+		return NULL;
+
+	attr_groups = devm_kcalloc(dev, 5, sizeof(struct attribute_group *),
+				   GFP_KERNEL);
+	if (!attr_groups)
+		return NULL;
+
+	attr_groups[0] = arm_cspmu_alloc_event_attr_group(cspmu);
+	attr_groups[1] = arm_cspmu_alloc_format_attr_group(cspmu);
+	attr_groups[2] = &arm_cspmu_identifier_attr_group;
+	attr_groups[3] = &arm_cspmu_cpumask_attr_group;
+
+	if (!attr_groups[0] || !attr_groups[1])
+		return NULL;
+
+	return attr_groups;
+}
+
+static inline void arm_cspmu_reset_counters(struct arm_cspmu *cspmu)
+{
+	u32 pmcr = 0;
+
+	pmcr |= PMCR_P;
+	pmcr |= PMCR_C;
+	writel(pmcr, cspmu->base0 + PMCR);
+}
+
+static inline void arm_cspmu_start_counters(struct arm_cspmu *cspmu)
+{
+	writel(PMCR_E, cspmu->base0 + PMCR);
+}
+
+static inline void arm_cspmu_stop_counters(struct arm_cspmu *cspmu)
+{
+	writel(0, cspmu->base0 + PMCR);
+}
+
+static void arm_cspmu_enable(struct pmu *pmu)
+{
+	bool disabled;
+	struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
+
+	disabled = bitmap_empty(cspmu->hw_events.used_ctrs,
+				cspmu->num_logical_ctrs);
+
+	if (disabled)
+		return;
+
+	arm_cspmu_start_counters(cspmu);
+}
+
+static void arm_cspmu_disable(struct pmu *pmu)
+{
+	struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
+
+	arm_cspmu_stop_counters(cspmu);
+}
+
+static int arm_cspmu_get_event_idx(struct arm_cspmu_hw_events *hw_events,
+				struct perf_event *event)
+{
+	int idx;
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+
+	if (supports_cycle_counter(cspmu)) {
+		if (cspmu->impl.ops.is_cycle_counter_event(event)) {
+			/* Search for available cycle counter. */
+			if (test_and_set_bit(cspmu->cycle_counter_logical_idx,
+					     hw_events->used_ctrs))
+				return -EAGAIN;
+
+			return cspmu->cycle_counter_logical_idx;
+		}
+
+		/*
+		 * Search a regular counter from the used counter bitmap.
+		 * The cycle counter divides the bitmap into two parts. Search
+		 * the first then second half to exclude the cycle counter bit.
+		 */
+		idx = find_first_zero_bit(hw_events->used_ctrs,
+					  cspmu->cycle_counter_logical_idx);
+		if (idx >= cspmu->cycle_counter_logical_idx) {
+			idx = find_next_zero_bit(
+				hw_events->used_ctrs,
+				cspmu->num_logical_ctrs,
+				cspmu->cycle_counter_logical_idx + 1);
+		}
+	} else {
+		idx = find_first_zero_bit(hw_events->used_ctrs,
+					  cspmu->num_logical_ctrs);
+	}
+
+	if (idx >= cspmu->num_logical_ctrs)
+		return -EAGAIN;
+
+	set_bit(idx, hw_events->used_ctrs);
+
+	return idx;
+}
+
+static bool arm_cspmu_validate_event(struct pmu *pmu,
+				 struct arm_cspmu_hw_events *hw_events,
+				 struct perf_event *event)
+{
+	if (is_software_event(event))
+		return true;
+
+	/* Reject groups spanning multiple HW PMUs. */
+	if (event->pmu != pmu)
+		return false;
+
+	return (arm_cspmu_get_event_idx(hw_events, event) >= 0);
+}
+
+/*
+ * Make sure the group of events can be scheduled at once
+ * on the PMU.
+ */
+static bool arm_cspmu_validate_group(struct perf_event *event)
+{
+	struct perf_event *sibling, *leader = event->group_leader;
+	struct arm_cspmu_hw_events fake_hw_events;
+
+	if (event->group_leader == event)
+		return true;
+
+	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
+
+	if (!arm_cspmu_validate_event(event->pmu, &fake_hw_events, leader))
+		return false;
+
+	for_each_sibling_event(sibling, leader) {
+		if (!arm_cspmu_validate_event(event->pmu, &fake_hw_events,
+						  sibling))
+			return false;
+	}
+
+	return arm_cspmu_validate_event(event->pmu, &fake_hw_events, event);
+}
+
+static int arm_cspmu_event_init(struct perf_event *event)
+{
+	struct arm_cspmu *cspmu;
+	struct hw_perf_event *hwc = &event->hw;
+
+	cspmu = to_arm_cspmu(event->pmu);
+
+	/*
+	 * Following other "uncore" PMUs, we do not support sampling mode or
+	 * attach to a task (per-process mode).
+	 */
+	if (is_sampling_event(event)) {
+		dev_dbg(cspmu->pmu.dev,
+			"Can't support sampling events\n");
+		return -EOPNOTSUPP;
+	}
+
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
+		dev_dbg(cspmu->pmu.dev,
+			"Can't support per-task counters\n");
+		return -EINVAL;
+	}
+
+	/*
+	 * Make sure the CPU assignment is on one of the CPUs associated with
+	 * this PMU.
+	 */
+	if (!cpumask_test_cpu(event->cpu, &cspmu->associated_cpus)) {
+		dev_dbg(cspmu->pmu.dev,
+			"Requested cpu is not associated with the PMU\n");
+		return -EINVAL;
+	}
+
+	/* Enforce the current active CPU to handle the events in this PMU. */
+	event->cpu = cpumask_first(&cspmu->active_cpu);
+	if (event->cpu >= nr_cpu_ids)
+		return -EINVAL;
+
+	if (!arm_cspmu_validate_group(event))
+		return -EINVAL;
+
+	/*
+	 * The logical counter id is tracked with hw_perf_event.extra_reg.idx.
+	 * The physical counter id is tracked with hw_perf_event.idx.
+	 * We don't assign an index until we actually place the event onto
+	 * hardware. Use -1 to signify that we haven't decided where to put it
+	 * yet.
+	 */
+	hwc->idx = -1;
+	hwc->extra_reg.idx = -1;
+	hwc->config = cspmu->impl.ops.event_type(event);
+
+	return 0;
+}
+
+static inline u32 counter_offset(u32 reg_sz, u32 ctr_idx)
+{
+	return (PMEVCNTR_LO + (reg_sz * ctr_idx));
+}
+
+static void arm_cspmu_write_counter(struct perf_event *event, u64 val)
+{
+	u32 offset;
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+
+	if (use_64b_counter_reg(cspmu)) {
+		offset = counter_offset(sizeof(u64), event->hw.idx);
+
+		writeq(val, cspmu->base1 + offset);
+	} else {
+		offset = counter_offset(sizeof(u32), event->hw.idx);
+
+		writel(lower_32_bits(val), cspmu->base1 + offset);
+	}
+}
+
+static u64 arm_cspmu_read_counter(struct perf_event *event)
+{
+	u32 offset;
+	const void __iomem *counter_addr;
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+
+	if (use_64b_counter_reg(cspmu)) {
+		offset = counter_offset(sizeof(u64), event->hw.idx);
+		counter_addr = cspmu->base1 + offset;
+
+		return supports_64bit_atomics(cspmu) ?
+			       readq(counter_addr) :
+			       read_reg64_hilohi(counter_addr);
+	}
+
+	offset = counter_offset(sizeof(u32), event->hw.idx);
+	return readl(cspmu->base1 + offset);
+}
+
+/*
+ * arm_cspmu_set_event_period: Set the period for the counter.
+ *
+ * To handle cases of extreme interrupt latency, we program
+ * the counter with half of the max count for the counters.
+ */
+static void arm_cspmu_set_event_period(struct perf_event *event)
+{
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+	u64 val = counter_mask(cspmu) >> 1ULL;
+
+	local64_set(&event->hw.prev_count, val);
+	arm_cspmu_write_counter(event, val);
+}
+
+static void arm_cspmu_enable_counter(struct arm_cspmu *cspmu, int idx)
+{
+	u32 reg_id, reg_bit, inten_off, cnten_off;
+
+	reg_id = COUNTER_TO_SET_CLR_ID(idx);
+	reg_bit = COUNTER_TO_SET_CLR_BIT(idx);
+
+	inten_off = PMINTENSET + (4 * reg_id);
+	cnten_off = PMCNTENSET + (4 * reg_id);
+
+	writel(BIT(reg_bit), cspmu->base0 + inten_off);
+	writel(BIT(reg_bit), cspmu->base0 + cnten_off);
+}
+
+static void arm_cspmu_disable_counter(struct arm_cspmu *cspmu, int idx)
+{
+	u32 reg_id, reg_bit, inten_off, cnten_off;
+
+	reg_id = COUNTER_TO_SET_CLR_ID(idx);
+	reg_bit = COUNTER_TO_SET_CLR_BIT(idx);
+
+	inten_off = PMINTENCLR + (4 * reg_id);
+	cnten_off = PMCNTENCLR + (4 * reg_id);
+
+	writel(BIT(reg_bit), cspmu->base0 + cnten_off);
+	writel(BIT(reg_bit), cspmu->base0 + inten_off);
+}
+
+static void arm_cspmu_event_update(struct perf_event *event)
+{
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u64 delta, prev, now;
+
+	do {
+		prev = local64_read(&hwc->prev_count);
+		now = arm_cspmu_read_counter(event);
+	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
+
+	delta = (now - prev) & counter_mask(cspmu);
+	local64_add(delta, &event->count);
+}
+
+static inline void arm_cspmu_set_event(struct arm_cspmu *cspmu,
+					struct hw_perf_event *hwc)
+{
+	u32 offset = PMEVTYPER + (4 * hwc->idx);
+
+	writel(hwc->config, cspmu->base0 + offset);
+}
+
+static inline void arm_cspmu_set_ev_filter(struct arm_cspmu *cspmu,
+					   struct hw_perf_event *hwc,
+					   u32 filter)
+{
+	u32 offset = PMEVFILTR + (4 * hwc->idx);
+
+	writel(filter, cspmu->base0 + offset);
+}
+
+static inline void arm_cspmu_set_cc_filter(struct arm_cspmu *cspmu, u32 filter)
+{
+	u32 offset = PMCCFILTR;
+
+	writel(filter, cspmu->base0 + offset);
+}
+
+static void arm_cspmu_start(struct perf_event *event, int pmu_flags)
+{
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	u32 filter;
+
+	/* We always reprogram the counter */
+	if (pmu_flags & PERF_EF_RELOAD)
+		WARN_ON(!(hwc->state & PERF_HES_UPTODATE));
+
+	arm_cspmu_set_event_period(event);
+
+	filter = cspmu->impl.ops.event_filter(event);
+
+	if (event->hw.extra_reg.idx == cspmu->cycle_counter_logical_idx) {
+		arm_cspmu_set_cc_filter(cspmu, filter);
+	} else {
+		arm_cspmu_set_event(cspmu, hwc);
+		arm_cspmu_set_ev_filter(cspmu, hwc, filter);
+	}
+
+	hwc->state = 0;
+
+	arm_cspmu_enable_counter(cspmu, hwc->idx);
+}
+
+static void arm_cspmu_stop(struct perf_event *event, int pmu_flags)
+{
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (hwc->state & PERF_HES_STOPPED)
+		return;
+
+	arm_cspmu_disable_counter(cspmu, hwc->idx);
+	arm_cspmu_event_update(event);
+
+	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static inline u32 to_phys_idx(struct arm_cspmu *cspmu, u32 idx)
+{
+	return (idx == cspmu->cycle_counter_logical_idx) ?
+		ARM_CSPMU_CYCLE_CNTR_IDX : idx;
+}
+
+static int arm_cspmu_add(struct perf_event *event, int flags)
+{
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+	struct arm_cspmu_hw_events *hw_events = &cspmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx;
+
+	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
+					   &cspmu->associated_cpus)))
+		return -ENOENT;
+
+	idx = arm_cspmu_get_event_idx(hw_events, event);
+	if (idx < 0)
+		return idx;
+
+	hw_events->events[idx] = event;
+	hwc->idx = to_phys_idx(cspmu, idx);
+	hwc->extra_reg.idx = idx;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		arm_cspmu_start(event, PERF_EF_RELOAD);
+
+	/* Propagate changes to the userspace mapping. */
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static void arm_cspmu_del(struct perf_event *event, int flags)
+{
+	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
+	struct arm_cspmu_hw_events *hw_events = &cspmu->hw_events;
+	struct hw_perf_event *hwc = &event->hw;
+	int idx = hwc->extra_reg.idx;
+
+	arm_cspmu_stop(event, PERF_EF_UPDATE);
+
+	hw_events->events[idx] = NULL;
+
+	clear_bit(idx, hw_events->used_ctrs);
+
+	perf_event_update_userpage(event);
+}
+
+static void arm_cspmu_read(struct perf_event *event)
+{
+	arm_cspmu_event_update(event);
+}
+
+static struct arm_cspmu *arm_cspmu_alloc(struct platform_device *pdev)
+{
+	struct acpi_apmt_node *apmt_node;
+	struct arm_cspmu *cspmu;
+	struct device *dev;
+
+	dev = &pdev->dev;
+	apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
+	if (!apmt_node) {
+		dev_err(dev, "failed to get APMT node\n");
+		return NULL;
+	}
+
+	cspmu = devm_kzalloc(dev, sizeof(*cspmu), GFP_KERNEL);
+	if (!cspmu)
+		return NULL;
+
+	cspmu->dev = dev;
+	cspmu->apmt_node = apmt_node;
+
+	platform_set_drvdata(pdev, cspmu);
+
+	return cspmu;
+}
+
+static int arm_cspmu_init_mmio(struct arm_cspmu *cspmu)
+{
+	struct device *dev;
+	struct platform_device *pdev;
+	struct acpi_apmt_node *apmt_node;
+
+	dev = cspmu->dev;
+	pdev = to_platform_device(dev);
+	apmt_node = cspmu->apmt_node;
+
+	/* Base address for page 0. */
+	cspmu->base0 = devm_platform_ioremap_resource(pdev, 0);
+	if (IS_ERR(cspmu->base0)) {
+		dev_err(dev, "ioremap failed for page-0 resource\n");
+		return PTR_ERR(cspmu->base0);
+	}
+
+	/* Base address for page 1 if supported. Otherwise point to page 0. */
+	cspmu->base1 = cspmu->base0;
+	if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
+		cspmu->base1 = devm_platform_ioremap_resource(pdev, 1);
+		if (IS_ERR(cspmu->base1)) {
+			dev_err(dev, "ioremap failed for page-1 resource\n");
+			return PTR_ERR(cspmu->base1);
+		}
+	}
+
+	cspmu->pmcfgr = readl(cspmu->base0 + PMCFGR);
+
+	cspmu->num_logical_ctrs = FIELD_GET(PMCFGR_N, cspmu->pmcfgr) + 1;
+
+	cspmu->cycle_counter_logical_idx = ARM_CSPMU_MAX_HW_CNTRS;
+
+	if (supports_cycle_counter(cspmu)) {
+		/*
+		 * The last logical counter is mapped to cycle counter if
+		 * there is a gap between regular and cycle counter. Otherwise,
+		 * logical and physical have 1-to-1 mapping.
+		 */
+		cspmu->cycle_counter_logical_idx =
+			(cspmu->num_logical_ctrs <= ARM_CSPMU_CYCLE_CNTR_IDX) ?
+				cspmu->num_logical_ctrs - 1 :
+				ARM_CSPMU_CYCLE_CNTR_IDX;
+	}
+
+	cspmu->num_set_clr_reg =
+		DIV_ROUND_UP(cspmu->num_logical_ctrs,
+				ARM_CSPMU_SET_CLR_COUNTER_NUM);
+
+	cspmu->hw_events.events =
+		devm_kcalloc(dev, cspmu->num_logical_ctrs,
+			     sizeof(*cspmu->hw_events.events), GFP_KERNEL);
+
+	if (!cspmu->hw_events.events)
+		return -ENOMEM;
+
+	return 0;
+}
+
+static inline int arm_cspmu_get_reset_overflow(struct arm_cspmu *cspmu,
+					       u32 *pmovs)
+{
+	int i;
+	u32 pmovclr_offset = PMOVSCLR;
+	u32 has_overflowed = 0;
+
+	for (i = 0; i < cspmu->num_set_clr_reg; ++i) {
+		pmovs[i] = readl(cspmu->base1 + pmovclr_offset);
+		has_overflowed |= pmovs[i];
+		writel(pmovs[i], cspmu->base1 + pmovclr_offset);
+		pmovclr_offset += sizeof(u32);
+	}
+
+	return has_overflowed != 0;
+}
+
+static irqreturn_t arm_cspmu_handle_irq(int irq_num, void *dev)
+{
+	int idx, has_overflowed;
+	struct perf_event *event;
+	struct arm_cspmu *cspmu = dev;
+	u32 pmovs[ARM_CSPMU_SET_CLR_MAX_NUM] = { 0 };
+	bool handled = false;
+
+	arm_cspmu_stop_counters(cspmu);
+
+	has_overflowed = arm_cspmu_get_reset_overflow(cspmu, pmovs);
+	if (!has_overflowed)
+		goto done;
+
+	for_each_set_bit(idx, cspmu->hw_events.used_ctrs,
+			cspmu->num_logical_ctrs) {
+		event = cspmu->hw_events.events[idx];
+
+		if (!event)
+			continue;
+
+		if (!test_bit(event->hw.idx, (unsigned long *)pmovs))
+			continue;
+
+		arm_cspmu_event_update(event);
+		arm_cspmu_set_event_period(event);
+
+		handled = true;
+	}
+
+done:
+	arm_cspmu_start_counters(cspmu);
+	return IRQ_RETVAL(handled);
+}
+
+static int arm_cspmu_request_irq(struct arm_cspmu *cspmu)
+{
+	int irq, ret;
+	struct device *dev;
+	struct platform_device *pdev;
+	struct acpi_apmt_node *apmt_node;
+
+	dev = cspmu->dev;
+	pdev = to_platform_device(dev);
+	apmt_node = cspmu->apmt_node;
+
+	/* Skip IRQ request if the PMU does not support overflow interrupt. */
+	if (apmt_node->ovflw_irq == 0)
+		return 0;
+
+	irq = platform_get_irq(pdev, 0);
+	if (irq < 0)
+		return irq;
+
+	ret = devm_request_irq(dev, irq, arm_cspmu_handle_irq,
+			       IRQF_NOBALANCING | IRQF_NO_THREAD, dev_name(dev),
+			       cspmu);
+	if (ret) {
+		dev_err(dev, "Could not request IRQ %d\n", irq);
+		return ret;
+	}
+
+	cspmu->irq = irq;
+
+	return 0;
+}
+
+static inline int arm_cspmu_find_cpu_container(int cpu, u32 container_uid)
+{
+	u32 acpi_uid;
+	struct device *cpu_dev = get_cpu_device(cpu);
+	struct acpi_device *acpi_dev = ACPI_COMPANION(cpu_dev);
+
+	if (!cpu_dev)
+		return -ENODEV;
+
+	while (acpi_dev) {
+		if (!strcmp(acpi_device_hid(acpi_dev),
+			    ACPI_PROCESSOR_CONTAINER_HID) &&
+		    !kstrtouint(acpi_device_uid(acpi_dev), 0, &acpi_uid) &&
+		    acpi_uid == container_uid)
+			return 0;
+
+		acpi_dev = acpi_dev->parent;
+	}
+
+	return -ENODEV;
+}
+
+static int arm_cspmu_get_cpus(struct arm_cspmu *cspmu)
+{
+	struct acpi_apmt_node *apmt_node;
+	int affinity_flag;
+	int cpu;
+
+	apmt_node = cspmu->apmt_node;
+	affinity_flag = apmt_node->flags & ACPI_APMT_FLAGS_AFFINITY;
+
+	if (affinity_flag == ACPI_APMT_FLAGS_AFFINITY_PROC) {
+		for_each_possible_cpu(cpu) {
+			if (apmt_node->proc_affinity ==
+			    get_acpi_id_for_cpu(cpu)) {
+				cpumask_set_cpu(cpu, &cspmu->associated_cpus);
+				break;
+			}
+		}
+	} else {
+		for_each_possible_cpu(cpu) {
+			if (arm_cspmu_find_cpu_container(
+				    cpu, apmt_node->proc_affinity))
+				continue;
+
+			cpumask_set_cpu(cpu, &cspmu->associated_cpus);
+		}
+	}
+
+	return 0;
+}
+
+static int arm_cspmu_register_pmu(struct arm_cspmu *cspmu)
+{
+	int ret;
+	struct attribute_group **attr_groups;
+
+	attr_groups = arm_cspmu_alloc_attr_group(cspmu);
+	if (!attr_groups)
+		return -ENOMEM;
+
+	ret = cpuhp_state_add_instance(arm_cspmu_cpuhp_state,
+				       &cspmu->cpuhp_node);
+	if (ret)
+		return ret;
+
+	cspmu->pmu = (struct pmu){
+		.task_ctx_nr	= perf_invalid_context,
+		.module		= THIS_MODULE,
+		.pmu_enable	= arm_cspmu_enable,
+		.pmu_disable	= arm_cspmu_disable,
+		.event_init	= arm_cspmu_event_init,
+		.add		= arm_cspmu_add,
+		.del		= arm_cspmu_del,
+		.start		= arm_cspmu_start,
+		.stop		= arm_cspmu_stop,
+		.read		= arm_cspmu_read,
+		.attr_groups	= (const struct attribute_group **)attr_groups,
+		.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
+	};
+
+	/* Hardware counter init */
+	arm_cspmu_stop_counters(cspmu);
+	arm_cspmu_reset_counters(cspmu);
+
+	ret = perf_pmu_register(&cspmu->pmu, cspmu->name, -1);
+	if (ret) {
+		cpuhp_state_remove_instance(arm_cspmu_cpuhp_state,
+					    &cspmu->cpuhp_node);
+	}
+
+	return ret;
+}
+
+static int arm_cspmu_device_probe(struct platform_device *pdev)
+{
+	int ret;
+	struct arm_cspmu *cspmu;
+
+	cspmu = arm_cspmu_alloc(pdev);
+	if (!cspmu)
+		return -ENOMEM;
+
+	ret = arm_cspmu_init_mmio(cspmu);
+	if (ret)
+		return ret;
+
+	ret = arm_cspmu_request_irq(cspmu);
+	if (ret)
+		return ret;
+
+	ret = arm_cspmu_get_cpus(cspmu);
+	if (ret)
+		return ret;
+
+	ret = arm_cspmu_register_pmu(cspmu);
+	if (ret)
+		return ret;
+
+	return 0;
+}
+
+static int arm_cspmu_device_remove(struct platform_device *pdev)
+{
+	struct arm_cspmu *cspmu = platform_get_drvdata(pdev);
+
+	perf_pmu_unregister(&cspmu->pmu);
+	cpuhp_state_remove_instance(arm_cspmu_cpuhp_state, &cspmu->cpuhp_node);
+
+	return 0;
+}
+
+static struct platform_driver arm_cspmu_driver = {
+	.driver = {
+			.name = DRVNAME,
+			.suppress_bind_attrs = true,
+		},
+	.probe = arm_cspmu_device_probe,
+	.remove = arm_cspmu_device_remove,
+};
+
+static void arm_cspmu_set_active_cpu(int cpu, struct arm_cspmu *cspmu)
+{
+	cpumask_set_cpu(cpu, &cspmu->active_cpu);
+	WARN_ON(irq_set_affinity(cspmu->irq, &cspmu->active_cpu));
+}
+
+static int arm_cspmu_cpu_online(unsigned int cpu, struct hlist_node *node)
+{
+	struct arm_cspmu *cspmu =
+		hlist_entry_safe(node, struct arm_cspmu, cpuhp_node);
+
+	if (!cpumask_test_cpu(cpu, &cspmu->associated_cpus))
+		return 0;
+
+	/* If the PMU is already managed, there is nothing to do */
+	if (!cpumask_empty(&cspmu->active_cpu))
+		return 0;
+
+	/* Use this CPU for event counting */
+	arm_cspmu_set_active_cpu(cpu, cspmu);
+
+	return 0;
+}
+
+static int arm_cspmu_cpu_teardown(unsigned int cpu, struct hlist_node *node)
+{
+	int dst;
+	struct cpumask online_supported;
+
+	struct arm_cspmu *cspmu =
+		hlist_entry_safe(node, struct arm_cspmu, cpuhp_node);
+
+	/* Nothing to do if this CPU doesn't own the PMU */
+	if (!cpumask_test_and_clear_cpu(cpu, &cspmu->active_cpu))
+		return 0;
+
+	/* Choose a new CPU to migrate ownership of the PMU to */
+	cpumask_and(&online_supported, &cspmu->associated_cpus,
+		    cpu_online_mask);
+	dst = cpumask_any_but(&online_supported, cpu);
+	if (dst >= nr_cpu_ids)
+		return 0;
+
+	/* Use this CPU for event counting */
+	perf_pmu_migrate_context(&cspmu->pmu, cpu, dst);
+	arm_cspmu_set_active_cpu(dst, cspmu);
+
+	return 0;
+}
+
+static int __init arm_cspmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+					"perf/arm/cspmu:online",
+					arm_cspmu_cpu_online,
+					arm_cspmu_cpu_teardown);
+	if (ret < 0)
+		return ret;
+	arm_cspmu_cpuhp_state = ret;
+	return platform_driver_register(&arm_cspmu_driver);
+}
+
+static void __exit arm_cspmu_exit(void)
+{
+	platform_driver_unregister(&arm_cspmu_driver);
+	cpuhp_remove_multi_state(arm_cspmu_cpuhp_state);
+}
+
+module_init(arm_cspmu_init);
+module_exit(arm_cspmu_exit);
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.h b/drivers/perf/arm_cspmu/arm_cspmu.h
new file mode 100644
index 000000000000..f1d7b2c9ade3
--- /dev/null
+++ b/drivers/perf/arm_cspmu/arm_cspmu.h
@@ -0,0 +1,151 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * ARM CoreSight Architecture PMU driver.
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+#ifndef __ARM_CSPMU_H__
+#define __ARM_CSPMU_H__
+
+#include <linux/acpi.h>
+#include <linux/bitfield.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <linux/platform_device.h>
+#include <linux/types.h>
+
+#define to_arm_cspmu(p) (container_of(p, struct arm_cspmu, pmu))
+
+#define ARM_CSPMU_EXT_ATTR(_name, _func, _config)			\
+	(&((struct dev_ext_attribute[]){				\
+		{							\
+			.attr = __ATTR(_name, 0444, _func, NULL),	\
+			.var = (void *)_config				\
+		}							\
+	})[0].attr.attr)
+
+#define ARM_CSPMU_FORMAT_ATTR(_name, _config)				\
+	ARM_CSPMU_EXT_ATTR(_name, arm_cspmu_sysfs_format_show, (char *)_config)
+
+#define ARM_CSPMU_EVENT_ATTR(_name, _config)				\
+	PMU_EVENT_ATTR_ID(_name, arm_cspmu_sysfs_event_show, _config)
+
+
+/* Default event id mask */
+#define ARM_CSPMU_EVENT_MASK	GENMASK_ULL(63, 0)
+
+/* Default filter value mask */
+#define ARM_CSPMU_FILTER_MASK	GENMASK_ULL(63, 0)
+
+/* Default event format */
+#define ARM_CSPMU_FORMAT_EVENT_ATTR	\
+	ARM_CSPMU_FORMAT_ATTR(event, "config:0-32")
+
+/* Default filter format */
+#define ARM_CSPMU_FORMAT_FILTER_ATTR	\
+	ARM_CSPMU_FORMAT_ATTR(filter, "config1:0-31")
+
+/*
+ * This is the default event number for cycle count, if supported, since the
+ * ARM Coresight PMU specification does not define a standard event code
+ * for cycle count.
+ */
+#define ARM_CSPMU_EVT_CYCLES_DEFAULT	(0x1ULL << 32)
+
+/*
+ * The ARM Coresight PMU supports up to 256 event counters.
+ * If the counters are larger-than 32-bits, then the PMU includes at
+ * most 128 counters.
+ */
+#define ARM_CSPMU_MAX_HW_CNTRS		256
+
+/* The cycle counter, if implemented, is located at counter[31]. */
+#define ARM_CSPMU_CYCLE_CNTR_IDX	31
+
+/* PMIIDR register field */
+#define ARM_CSPMU_PMIIDR_IMPLEMENTER	GENMASK(11, 0)
+#define ARM_CSPMU_PMIIDR_PRODUCTID	GENMASK(31, 20)
+
+struct arm_cspmu;
+
+/* This tracks the events assigned to each counter in the PMU. */
+struct arm_cspmu_hw_events {
+	/* The events that are active on the PMU for a given logical index. */
+	struct perf_event **events;
+
+	/*
+	 * Each bit indicates a logical counter is being used (or not) for an
+	 * event. If cycle counter is supported and there is a gap between
+	 * regular and cycle counter, the last logical counter is mapped to
+	 * cycle counter. Otherwise, logical and physical have 1-to-1 mapping.
+	 */
+	DECLARE_BITMAP(used_ctrs, ARM_CSPMU_MAX_HW_CNTRS);
+};
+
+/* Contains ops to query vendor/implementer specific attribute. */
+struct arm_cspmu_impl_ops {
+	/* Get event attributes */
+	struct attribute **(*get_event_attrs)(const struct arm_cspmu *cspmu);
+	/* Get format attributes */
+	struct attribute **(*get_format_attrs)(const struct arm_cspmu *cspmu);
+	/* Get string identifier */
+	const char *(*get_identifier)(const struct arm_cspmu *cspmu);
+	/* Get PMU name to register to core perf */
+	const char *(*get_name)(const struct arm_cspmu *cspmu);
+	/* Check if the event corresponds to cycle count event */
+	bool (*is_cycle_counter_event)(const struct perf_event *event);
+	/* Decode event type/id from configs */
+	u32 (*event_type)(const struct perf_event *event);
+	/* Decode filter value from configs */
+	u32 (*event_filter)(const struct perf_event *event);
+	/* Hide/show unsupported events */
+	umode_t (*event_attr_is_visible)(struct kobject *kobj,
+					 struct attribute *attr, int unused);
+};
+
+/* Vendor/implementer descriptor. */
+struct arm_cspmu_impl {
+	u32 pmiidr;
+	struct arm_cspmu_impl_ops ops;
+	void *ctx;
+};
+
+/* Coresight PMU descriptor. */
+struct arm_cspmu {
+	struct pmu pmu;
+	struct device *dev;
+	struct acpi_apmt_node *apmt_node;
+	const char *name;
+	const char *identifier;
+	void __iomem *base0;
+	void __iomem *base1;
+	int irq;
+	cpumask_t associated_cpus;
+	cpumask_t active_cpu;
+	struct hlist_node cpuhp_node;
+
+	u32 pmcfgr;
+	u32 num_logical_ctrs;
+	u32 num_set_clr_reg;
+	int cycle_counter_logical_idx;
+
+	struct arm_cspmu_hw_events hw_events;
+
+	struct arm_cspmu_impl impl;
+};
+
+/* Default function to show event attribute in sysfs. */
+ssize_t arm_cspmu_sysfs_event_show(struct device *dev,
+				   struct device_attribute *attr,
+				   char *buf);
+
+/* Default function to show format attribute in sysfs. */
+ssize_t arm_cspmu_sysfs_format_show(struct device *dev,
+				    struct device_attribute *attr,
+				    char *buf);
+
+#endif /* __ARM_CSPMU_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
  2022-08-14 18:23 [PATCH v4 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2022-08-14 18:23 ` [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
@ 2022-08-14 18:23 ` Besar Wicaksono
  2022-09-27 11:42   ` Suzuki K Poulose
  2022-08-23 17:24 ` [PATCH v4 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2 siblings, 1 reply; 13+ messages in thread
From: Besar Wicaksono @ 2022-08-14 18:23 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	mathieu.poirier, mike.leach, leo.yan, Besar Wicaksono

Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
Fabric (MCF) PMU attributes for CoreSight PMU implementation in
NVIDIA devices.

Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
---
 Documentation/admin-guide/perf/index.rst      |   1 +
 Documentation/admin-guide/perf/nvidia-pmu.rst | 120 ++++++
 drivers/perf/arm_cspmu/Makefile               |   3 +-
 drivers/perf/arm_cspmu/arm_cspmu.c            |   7 +
 drivers/perf/arm_cspmu/nvidia_cspmu.c         | 367 ++++++++++++++++++
 drivers/perf/arm_cspmu/nvidia_cspmu.h         |  17 +
 6 files changed, 514 insertions(+), 1 deletion(-)
 create mode 100644 Documentation/admin-guide/perf/nvidia-pmu.rst
 create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.c
 create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.h

diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 69b23f087c05..cf05fed1f67f 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -17,3 +17,4 @@ Performance monitor support
    xgene-pmu
    arm_dsu_pmu
    thunderx2-pmu
+   nvidia-pmu
diff --git a/Documentation/admin-guide/perf/nvidia-pmu.rst b/Documentation/admin-guide/perf/nvidia-pmu.rst
new file mode 100644
index 000000000000..c41b93965824
--- /dev/null
+++ b/Documentation/admin-guide/perf/nvidia-pmu.rst
@@ -0,0 +1,120 @@
+=========================================================
+NVIDIA Tegra SoC Uncore Performance Monitoring Unit (PMU)
+=========================================================
+
+The NVIDIA Tegra SoC includes various system PMUs to measure key performance
+metrics like memory bandwidth, latency, and utilization:
+
+* Scalable Coherency Fabric (SCF)
+* Memory Controller Fabric (MCF) GPU physical interface
+* MCF GPU virtual interface
+* MCF NVLINK interface
+* MCF PCIE interface
+
+PMU Driver
+----------
+
+The PMUs in this document are based on ARM CoreSight PMU Architecture as
+described in document: ARM IHI 0091. Since this is a standard architecture, the
+PMUs are managed by a common driver "arm-cs-arch-pmu". This driver describes
+the available events and configuration of each PMU in sysfs. Please see the
+sections below to get the sysfs path of each PMU. Like other uncore PMU driver,
+the driver provides "cpumask" sysfs attribute to show the CPU id used to handle
+the PMU event. There is also "associated_cpus" sysfs attribute, which contains a
+list of CPUs associated with the PMU instance.
+
+SCF PMU
+-------
+
+The SCF PMU monitors system level cache events, CPU traffic, and
+strongly-ordered PCIE traffic to local/remote memory.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
+
+Example usage::
+
+  perf stat -a -e nvidia_scf_pmu_0/config=0x0/
+
+This will count the events in socket 0.
+
+MCF GPU Physical PMU
+--------------------
+
+The MCF GPU physical PMU monitors ATS translated traffic from GPU to
+local/remote memory via Nvlink C2C.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_mcf_gpu_pmu_<socket-id>.
+
+Multiple GPUs can be connected to the SoC. The user can use "gpu" bitmap
+parameter to select the GPU(s) to monitor, i.e. "gpu=0xF" corresponds to GPU 0
+to 3. /sys/bus/event_sources/devices/nvidia_mcf_gpu_pmu_<socket-id>/format/gpu
+shows the valid bits that can be set in the "gpu" parameter.
+
+Example usage::
+
+  perf stat -a -e nvidia_mcf_gpu_pmu_0/config=0x0,gpu=0x3/
+
+This will count the events on GPU 0 and 1 that are connected to SoC in socket 0.
+
+MCF GPU Virtual PMU
+-------------------
+
+The MCF GPU virtual PMU monitors SMMU inline translated traffic (as opposed to
+ATS) from GPU to local/remote memory via Nvlink C2C.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_mcf_gpuvir_pmu_<socket-id>.
+
+Multiple GPUs can be connected to the SoC. The user can use "gpu" bitmap
+parameter to select the GPU(s) to monitor, i.e. "gpu=0xF" corresponds to GPU 0
+to 3. /sys/bus/event_sources/devices/nvidia_mcf_gpuvir_pmu_<socket-id>/format/gpu
+shows the valid bits that can be set in the "gpu" parameter.
+
+Example usage::
+
+  perf stat -a -e nvidia_mcf_gpuvir_pmu_0/config=0x0,gpu=0x3/
+
+This will count the events on GPU 0 and 1 that are connected to SoC in socket 0.
+
+MCF NVLINK PMU
+--------------
+
+The MCF NVLINK PMU monitors I/O coherent traffic from external socket to local
+memory.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_mcf_nvlink_pmu_<socket-id>.
+
+Each SoC socket can be connected to one or more sockets via NVLINK. The user can
+use "rem_socket" bitmap parameter to select the remote socket(s) to monitor,
+i.e. "rem_socket=0xE" corresponds to socket 1 to 3.
+/sys/bus/event_sources/devices/nvidia_mcf_nvlink_pmu_<socket-id>/format/rem_socket
+shows the valid bits that can be set in the "rem_socket" parameter.
+
+Example usage::
+
+  perf stat -a -e nvidia_mcf_nvlink_pmu_0/config=0x0,rem_socket=0x6/
+
+This will count the events from remote socket 1 and 2 to socket 0.
+
+MCF PCIE PMU
+------------
+
+The MCF PCIE PMU monitors traffic from PCIE root ports to local/remote memory.
+
+The events and configuration options of this PMU device are described in sysfs,
+see /sys/bus/event_sources/devices/nvidia_mcf_pcie_pmu_<socket-id>.
+
+Each SoC socket can support multiple root ports. The user can use
+"root_port" bitmap parameter to select the port(s) to monitor, i.e.
+"root_port=0xF" corresponds to root port 0 to 3.
+/sys/bus/event_sources/devices/nvidia_mcf_pcie_pmu_<socket-id>/format/root_port
+shows the valid bits that can be set in the "root_port" parameter.
+
+Example usage::
+
+  perf stat -a -e nvidia_mcf_pcie_pmu_0/config=0x0,root_port=0x3/
+
+This will count the events from root port 0 and 1 of socket 0.
diff --git a/drivers/perf/arm_cspmu/Makefile b/drivers/perf/arm_cspmu/Makefile
index cdc3455f74d8..1b586064bd77 100644
--- a/drivers/perf/arm_cspmu/Makefile
+++ b/drivers/perf/arm_cspmu/Makefile
@@ -3,4 +3,5 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += \
-	arm_cspmu.o
+	arm_cspmu.o \
+	nvidia_cspmu.o
diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
index 410876f86eb0..7a0beb515e53 100644
--- a/drivers/perf/arm_cspmu/arm_cspmu.c
+++ b/drivers/perf/arm_cspmu/arm_cspmu.c
@@ -31,6 +31,7 @@
 #include <acpi/processor.h>
 
 #include "arm_cspmu.h"
+#include "nvidia_cspmu.h"
 
 #define PMUNAME "arm_cspmu"
 #define DRVNAME "arm-cs-arch-pmu"
@@ -118,6 +119,9 @@ static_assert(
 			ops->callback = arm_cspmu_ ## callback;	\
 	} while (0)
 
+/* JEDEC-assigned JEP106 identification code */
+#define ARM_CSPMU_IMPL_ID_NVIDIA		0x36B
+
 static unsigned long arm_cspmu_cpuhp_state;
 
 /*
@@ -369,6 +373,9 @@ struct impl_match {
 };
 
 static const struct impl_match impl_match[] = {
+	{ .pmiidr = ARM_CSPMU_IMPL_ID_NVIDIA,
+	  .mask = ARM_CSPMU_PMIIDR_IMPLEMENTER,
+	  .impl_init_ops = nv_cspmu_init_ops },
 	{}
 };
 
diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu/nvidia_cspmu.c
new file mode 100644
index 000000000000..261f20680bc1
--- /dev/null
+++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c
@@ -0,0 +1,367 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+/* Support for NVIDIA specific attributes. */
+
+#include "nvidia_cspmu.h"
+
+#define NV_MCF_PCIE_PORT_COUNT       10ULL
+#define NV_MCF_PCIE_FILTER_ID_MASK   GENMASK_ULL(NV_MCF_PCIE_PORT_COUNT - 1, 0)
+
+#define NV_MCF_GPU_PORT_COUNT        2ULL
+#define NV_MCF_GPU_FILTER_ID_MASK    GENMASK_ULL(NV_MCF_GPU_PORT_COUNT - 1, 0)
+
+#define NV_MCF_NVL_PORT_COUNT        4ULL
+#define NV_MCF_NVL_FILTER_ID_MASK    GENMASK_ULL(NV_MCF_NVL_PORT_COUNT - 1, 0)
+
+#define NV_SCF_MCF_PRODID_MASK       GENMASK(31, 0)
+
+#define NV_FORMAT_NAME_GENERIC	0
+
+#define to_nv_cspmu_ctx(cspmu)	((struct nv_cspmu_ctx *)(cspmu->impl.ctx))
+
+#define NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _num, _suff, _config)	\
+	ARM_CSPMU_EVENT_ATTR(_pref##_num##_suff, _config)
+
+#define NV_CSPMU_EVENT_ATTR_4(_pref, _suff, _config)			\
+	NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _0_, _suff, _config),	\
+	NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _1_, _suff, _config + 1),	\
+	NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _2_, _suff, _config + 2),	\
+	NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _3_, _suff, _config + 3)
+
+struct nv_cspmu_ctx {
+	const char *name;
+	u32 filter_mask;
+	struct attribute **event_attr;
+	struct attribute **format_attr;
+};
+
+static struct attribute *scf_pmu_event_attrs[] = {
+	ARM_CSPMU_EVENT_ATTR(bus_cycles,			0x1d),
+
+	ARM_CSPMU_EVENT_ATTR(scf_cache_allocate,		0xF0),
+	ARM_CSPMU_EVENT_ATTR(scf_cache_refill,			0xF1),
+	ARM_CSPMU_EVENT_ATTR(scf_cache,				0xF2),
+	ARM_CSPMU_EVENT_ATTR(scf_cache_wb,			0xF3),
+
+	NV_CSPMU_EVENT_ATTR_4(socket, rd_data,			0x101),
+	NV_CSPMU_EVENT_ATTR_4(socket, dl_rsp,			0x105),
+	NV_CSPMU_EVENT_ATTR_4(socket, wb_data,			0x109),
+	NV_CSPMU_EVENT_ATTR_4(socket, ev_rsp,			0x10d),
+	NV_CSPMU_EVENT_ATTR_4(socket, prb_data,			0x111),
+
+	NV_CSPMU_EVENT_ATTR_4(socket, rd_outstanding,		0x115),
+	NV_CSPMU_EVENT_ATTR_4(socket, dl_outstanding,		0x119),
+	NV_CSPMU_EVENT_ATTR_4(socket, wb_outstanding,		0x11d),
+	NV_CSPMU_EVENT_ATTR_4(socket, wr_outstanding,		0x121),
+	NV_CSPMU_EVENT_ATTR_4(socket, ev_outstanding,		0x125),
+	NV_CSPMU_EVENT_ATTR_4(socket, prb_outstanding,		0x129),
+
+	NV_CSPMU_EVENT_ATTR_4(socket, rd_access,		0x12d),
+	NV_CSPMU_EVENT_ATTR_4(socket, dl_access,		0x131),
+	NV_CSPMU_EVENT_ATTR_4(socket, wb_access,		0x135),
+	NV_CSPMU_EVENT_ATTR_4(socket, wr_access,		0x139),
+	NV_CSPMU_EVENT_ATTR_4(socket, ev_access,		0x13d),
+	NV_CSPMU_EVENT_ATTR_4(socket, prb_access,		0x141),
+
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_data,		0x145),
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_access,		0x149),
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_access,		0x14d),
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_outstanding,		0x151),
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_outstanding,		0x155),
+
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_data,			0x159),
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_access,		0x15d),
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_access,		0x161),
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_outstanding,		0x165),
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_outstanding,		0x169),
+
+	ARM_CSPMU_EVENT_ATTR(gmem_rd_data,			0x16d),
+	ARM_CSPMU_EVENT_ATTR(gmem_rd_access,			0x16e),
+	ARM_CSPMU_EVENT_ATTR(gmem_rd_outstanding,		0x16f),
+	ARM_CSPMU_EVENT_ATTR(gmem_dl_rsp,			0x170),
+	ARM_CSPMU_EVENT_ATTR(gmem_dl_access,			0x171),
+	ARM_CSPMU_EVENT_ATTR(gmem_dl_outstanding,		0x172),
+	ARM_CSPMU_EVENT_ATTR(gmem_wb_data,			0x173),
+	ARM_CSPMU_EVENT_ATTR(gmem_wb_access,			0x174),
+	ARM_CSPMU_EVENT_ATTR(gmem_wb_outstanding,		0x175),
+	ARM_CSPMU_EVENT_ATTR(gmem_ev_rsp,			0x176),
+	ARM_CSPMU_EVENT_ATTR(gmem_ev_access,			0x177),
+	ARM_CSPMU_EVENT_ATTR(gmem_ev_outstanding,		0x178),
+	ARM_CSPMU_EVENT_ATTR(gmem_wr_data,			0x179),
+	ARM_CSPMU_EVENT_ATTR(gmem_wr_outstanding,		0x17a),
+	ARM_CSPMU_EVENT_ATTR(gmem_wr_access,			0x17b),
+
+	NV_CSPMU_EVENT_ATTR_4(socket, wr_data,			0x17c),
+
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_data,		0x180),
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_data,		0x184),
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_access,		0x188),
+	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_outstanding,		0x18c),
+
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_data,			0x190),
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_data,			0x194),
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_access,		0x198),
+	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_outstanding,		0x19c),
+
+	ARM_CSPMU_EVENT_ATTR(gmem_wr_total_bytes,		0x1a0),
+	ARM_CSPMU_EVENT_ATTR(remote_socket_wr_total_bytes,	0x1a1),
+	ARM_CSPMU_EVENT_ATTR(remote_socket_rd_data,		0x1a2),
+	ARM_CSPMU_EVENT_ATTR(remote_socket_rd_outstanding,	0x1a3),
+	ARM_CSPMU_EVENT_ATTR(remote_socket_rd_access,		0x1a4),
+
+	ARM_CSPMU_EVENT_ATTR(cmem_rd_data,			0x1a5),
+	ARM_CSPMU_EVENT_ATTR(cmem_rd_access,			0x1a6),
+	ARM_CSPMU_EVENT_ATTR(cmem_rd_outstanding,		0x1a7),
+	ARM_CSPMU_EVENT_ATTR(cmem_dl_rsp,			0x1a8),
+	ARM_CSPMU_EVENT_ATTR(cmem_dl_access,			0x1a9),
+	ARM_CSPMU_EVENT_ATTR(cmem_dl_outstanding,		0x1aa),
+	ARM_CSPMU_EVENT_ATTR(cmem_wb_data,			0x1ab),
+	ARM_CSPMU_EVENT_ATTR(cmem_wb_access,			0x1ac),
+	ARM_CSPMU_EVENT_ATTR(cmem_wb_outstanding,		0x1ad),
+	ARM_CSPMU_EVENT_ATTR(cmem_ev_rsp,			0x1ae),
+	ARM_CSPMU_EVENT_ATTR(cmem_ev_access,			0x1af),
+	ARM_CSPMU_EVENT_ATTR(cmem_ev_outstanding,		0x1b0),
+	ARM_CSPMU_EVENT_ATTR(cmem_wr_data,			0x1b1),
+	ARM_CSPMU_EVENT_ATTR(cmem_wr_outstanding,		0x1b2),
+
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_data,		0x1b3),
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_access,		0x1b7),
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_access,		0x1bb),
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_outstanding,		0x1bf),
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_outstanding,		0x1c3),
+
+	ARM_CSPMU_EVENT_ATTR(ocu_prb_access,			0x1c7),
+	ARM_CSPMU_EVENT_ATTR(ocu_prb_data,			0x1c8),
+	ARM_CSPMU_EVENT_ATTR(ocu_prb_outstanding,		0x1c9),
+
+	ARM_CSPMU_EVENT_ATTR(cmem_wr_access,			0x1ca),
+
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_access,		0x1cb),
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_data,		0x1cf),
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_data,		0x1d3),
+	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_outstanding,		0x1d7),
+
+	ARM_CSPMU_EVENT_ATTR(cmem_wr_total_bytes,		0x1db),
+
+	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute *mcf_pmu_event_attrs[] = {
+	ARM_CSPMU_EVENT_ATTR(rd_bytes_loc,			0x0),
+	ARM_CSPMU_EVENT_ATTR(rd_bytes_rem,			0x1),
+	ARM_CSPMU_EVENT_ATTR(wr_bytes_loc,			0x2),
+	ARM_CSPMU_EVENT_ATTR(wr_bytes_rem,			0x3),
+	ARM_CSPMU_EVENT_ATTR(total_bytes_loc,			0x4),
+	ARM_CSPMU_EVENT_ATTR(total_bytes_rem,			0x5),
+	ARM_CSPMU_EVENT_ATTR(rd_req_loc,			0x6),
+	ARM_CSPMU_EVENT_ATTR(rd_req_rem,			0x7),
+	ARM_CSPMU_EVENT_ATTR(wr_req_loc,			0x8),
+	ARM_CSPMU_EVENT_ATTR(wr_req_rem,			0x9),
+	ARM_CSPMU_EVENT_ATTR(total_req_loc,			0xa),
+	ARM_CSPMU_EVENT_ATTR(total_req_rem,			0xb),
+	ARM_CSPMU_EVENT_ATTR(rd_cum_outs_loc,			0xc),
+	ARM_CSPMU_EVENT_ATTR(rd_cum_outs_rem,			0xd),
+	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute *generic_pmu_event_attrs[] = {
+	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
+	NULL,
+};
+
+static struct attribute *scf_pmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_EVENT_ATTR,
+	NULL,
+};
+
+static struct attribute *mcf_pcie_pmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_EVENT_ATTR,
+	ARM_CSPMU_FORMAT_ATTR(root_port, "config1:0-9"),
+	NULL,
+};
+
+static struct attribute *mcf_gpu_pmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_EVENT_ATTR,
+	ARM_CSPMU_FORMAT_ATTR(gpu, "config1:0-1"),
+	NULL,
+};
+
+static struct attribute *mcf_nvlink_pmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_EVENT_ATTR,
+	ARM_CSPMU_FORMAT_ATTR(rem_socket, "config1:0-3"),
+	NULL,
+};
+
+static struct attribute *generic_pmu_format_attrs[] = {
+	ARM_CSPMU_FORMAT_EVENT_ATTR,
+	ARM_CSPMU_FORMAT_FILTER_ATTR,
+	NULL,
+};
+
+static struct attribute **
+nv_cspmu_get_event_attrs(const struct arm_cspmu *cspmu)
+{
+	const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
+
+	return ctx->event_attr;
+}
+
+static struct attribute **
+nv_cspmu_get_format_attrs(const struct arm_cspmu *cspmu)
+{
+	const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
+
+	return ctx->format_attr;
+}
+
+static const char *
+nv_cspmu_get_name(const struct arm_cspmu *cspmu)
+{
+	const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
+
+	return ctx->name;
+}
+
+static u32 nv_cspmu_event_filter(const struct perf_event *event)
+{
+	const struct nv_cspmu_ctx *ctx =
+		to_nv_cspmu_ctx(to_arm_cspmu(event->pmu));
+
+	return event->attr.config1 & ctx->filter_mask;
+}
+
+enum nv_cspmu_name_fmt {
+	NAME_FMT_GENERIC,
+	NAME_FMT_PROC
+};
+
+struct nv_cspmu_match {
+	u32 prodid;
+	u32 prodid_mask;
+	u64 filter_mask;
+	const char *name_pattern;
+	enum nv_cspmu_name_fmt name_fmt;
+	struct attribute **event_attr;
+	struct attribute **format_attr;
+};
+
+static const struct nv_cspmu_match nv_cspmu_match[] = {
+	{ .prodid = 0x103,
+	  .prodid_mask = NV_SCF_MCF_PRODID_MASK,
+	  .filter_mask = NV_MCF_PCIE_FILTER_ID_MASK,
+	  .name_pattern = "nvidia_mcf_pcie_pmu_%u",
+	  .name_fmt = NAME_FMT_PROC,
+	  .event_attr = mcf_pmu_event_attrs,
+	  .format_attr = mcf_pcie_pmu_format_attrs },
+	{ .prodid = 0x104,
+	  .prodid_mask = NV_SCF_MCF_PRODID_MASK,
+	  .filter_mask = NV_MCF_GPU_FILTER_ID_MASK,
+	  .name_pattern = "nvidia_mcf_gpuvir_pmu_%u",
+	  .name_fmt = NAME_FMT_PROC,
+	  .event_attr = mcf_pmu_event_attrs,
+	  .format_attr = mcf_gpu_pmu_format_attrs },
+	{ .prodid = 0x105,
+	  .prodid_mask = NV_SCF_MCF_PRODID_MASK,
+	  .filter_mask = NV_MCF_GPU_FILTER_ID_MASK,
+	  .name_pattern = "nvidia_mcf_gpu_pmu_%u",
+	  .name_fmt = NAME_FMT_PROC,
+	  .event_attr = mcf_pmu_event_attrs,
+	  .format_attr = mcf_gpu_pmu_format_attrs },
+	{ .prodid = 0x106,
+	  .prodid_mask = NV_SCF_MCF_PRODID_MASK,
+	  .filter_mask = NV_MCF_NVL_FILTER_ID_MASK,
+	  .name_pattern = "nvidia_mcf_nvlink_pmu_%u",
+	  .name_fmt = NAME_FMT_PROC,
+	  .event_attr = mcf_pmu_event_attrs,
+	  .format_attr = mcf_nvlink_pmu_format_attrs },
+	{ .prodid = 0x2CF,
+	  .prodid_mask = NV_SCF_MCF_PRODID_MASK,
+	  .filter_mask = 0x0,
+	  .name_pattern = "nvidia_scf_pmu_%u",
+	  .name_fmt = NAME_FMT_PROC,
+	  .event_attr = scf_pmu_event_attrs,
+	  .format_attr = scf_pmu_format_attrs },
+	{ .prodid = 0,
+	  .prodid_mask = 0,
+	  .filter_mask = ARM_CSPMU_FILTER_MASK,
+	  .name_pattern = "nvidia_uncore_pmu_%u",
+	  .name_fmt = NAME_FMT_GENERIC,
+	  .event_attr = generic_pmu_event_attrs,
+	  .format_attr = generic_pmu_format_attrs },
+};
+
+static char *nv_cspmu_format_name(const struct arm_cspmu *cspmu,
+				  const struct nv_cspmu_match *match)
+{
+	char *name;
+	struct device *dev = cspmu->dev;
+
+	static atomic_t pmu_generic_idx = {0};
+
+	switch (match->name_fmt) {
+	case NAME_FMT_PROC:
+		name = devm_kasprintf(dev, GFP_KERNEL, match->name_pattern,
+				       cspmu->apmt_node->proc_affinity);
+		break;
+	case NAME_FMT_GENERIC:
+		name = devm_kasprintf(dev, GFP_KERNEL, match->name_pattern,
+				       atomic_fetch_inc(&pmu_generic_idx));
+		break;
+	default:
+		name = NULL;
+		break;
+	}
+
+	return name;
+}
+
+int nv_cspmu_init_ops(struct arm_cspmu *cspmu)
+{
+	u32 prodid;
+	struct nv_cspmu_ctx *ctx;
+	struct device *dev = cspmu->dev;
+	struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
+	const struct nv_cspmu_match *match = nv_cspmu_match;
+
+	ctx = devm_kzalloc(dev, sizeof(struct nv_cspmu_ctx), GFP_KERNEL);
+	if (!ctx)
+		return -ENOMEM;
+
+	prodid = FIELD_GET(ARM_CSPMU_PMIIDR_PRODUCTID, cspmu->impl.pmiidr);
+
+	/* Find matching PMU. */
+	for (; match->prodid; match++) {
+		const u32 prodid_mask = match->prodid_mask;
+
+		if ((match->prodid & prodid_mask) == (prodid & prodid_mask))
+			break;
+	}
+
+	ctx->name		= nv_cspmu_format_name(cspmu, match);
+	ctx->filter_mask	= match->filter_mask;
+	ctx->event_attr		= match->event_attr;
+	ctx->format_attr	= match->format_attr;
+
+	cspmu->impl.ctx = ctx;
+
+	/* NVIDIA specific callbacks. */
+	impl_ops->event_filter			= nv_cspmu_event_filter;
+	impl_ops->get_event_attrs		= nv_cspmu_get_event_attrs;
+	impl_ops->get_format_attrs		= nv_cspmu_get_format_attrs;
+	impl_ops->get_name			= nv_cspmu_get_name;
+
+	/* Set others to NULL to use default callback. */
+	impl_ops->event_type			= NULL;
+	impl_ops->event_attr_is_visible		= NULL;
+	impl_ops->get_identifier		= NULL;
+	impl_ops->is_cycle_counter_event	= NULL;
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(nv_cspmu_init_ops);
diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.h b/drivers/perf/arm_cspmu/nvidia_cspmu.h
new file mode 100644
index 000000000000..eefba85644f6
--- /dev/null
+++ b/drivers/perf/arm_cspmu/nvidia_cspmu.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0
+ *
+ * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
+ *
+ */
+
+/* Support for NVIDIA specific attributes. */
+
+#ifndef __NVIDIA_CSPMU_H__
+#define __NVIDIA_CSPMU_H__
+
+#include "arm_cspmu.h"
+
+/* Allocate NVIDIA descriptor. */
+int nv_cspmu_init_ops(struct arm_cspmu *cspmu);
+
+#endif /* __NVIDIA_CSPMU_H__ */
-- 
2.17.1


^ permalink raw reply related	[flat|nested] 13+ messages in thread

* RE: [PATCH v4 0/2] perf: ARM CoreSight PMU support
  2022-08-14 18:23 [PATCH v4 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
  2022-08-14 18:23 ` [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
  2022-08-14 18:23 ` [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
@ 2022-08-23 17:24 ` Besar Wicaksono
  2022-09-22 13:54   ` Will Deacon
  2 siblings, 1 reply; 13+ messages in thread
From: Besar Wicaksono @ 2022-08-23 17:24 UTC (permalink / raw)
  To: suzuki.poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi, mathieu.poirier, mike.leach,
	leo.yan

Hi Reviewers,

Could we get some comments for this patchset ?
Or an estimate when this could get your review is greatly appreciated.

Regards,
Besar

> -----Original Message-----
> From: Besar Wicaksono <bwicaksono@nvidia.com>
> Sent: Sunday, August 14, 2022 1:24 PM
> To: suzuki.poulose@arm.com; robin.murphy@arm.com;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>; mathieu.poirier@linaro.org;
> mike.leach@linaro.org; leo.yan@linaro.org; Besar Wicaksono
> <bwicaksono@nvidia.com>
> Subject: [PATCH v4 0/2] perf: ARM CoreSight PMU support
> 
> Add driver support for ARM CoreSight PMU device and event attributes for
> NVIDIA
> implementation. The code is based on ARM Coresight PMU architecture and
> ACPI ARM
> Performance Monitoring Unit table (APMT) specification below:
>  * ARM Coresight PMU:
>         https://developer.arm.com/documentation/ihi0091/latest
>  * APMT: https://developer.arm.com/documentation/den0117/latest
> 
> The patchset applies on top of
>   https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
>   master next-20220524
> 
> For APMT support, please see patchset:
> https://lkml.org/lkml/2022/4/19/1395
> 
> Changes from v3:
>  * Driver is now probing "arm-cs-arch-pmu" device.
>  * The driver files, directory, functions are renamed with "arm_cspmu"
> prefix.
>  * Use Kconfig ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU.
>  * Add kernel doc for NVIDIA Uncore PMU.
>  * Use GENMASK and FIELD_GET macros everywhere.
> Thanks to suzuki.poulose@arm.com and will@kernel.org for the review
> comments.
> v3: https://lore.kernel.org/linux-arm-kernel/20220621055035.31766-1-
> bwicaksono@nvidia.com/
> 
> Changes from v2:
>  * Driver is now probing "arm-system-pmu" device.
>  * Change default PMU naming to "arm_<APMT node type>_pmu".
>  * Add implementor ops to generate custom name.
> Thanks to suzuki.poulose@arm.com for the review comments.
> v2: https://lore.kernel.org/linux-arm-kernel/20220515163044.50055-1-
> bwicaksono@nvidia.com/
> 
> Changes from v1:
>  * Remove CPU arch dependency.
>  * Remove 32-bit read/write helper function and just use read/writel.
>  * Add .is_visible into event attribute to filter out cycle counter event.
>  * Update pmiidr matching.
>  * Remove read-modify-write on PMCR since the driver only writes to
> PMCR.E.
>  * Assign default cycle event outside the 32-bit PMEVTYPER range.
>  * Rework the active event and used counter tracking.
> Thanks to robin.murphy@arm.com for the review comments.
> v1: https://lore.kernel.org/linux-arm-kernel/20220509002810.12412-1-
> bwicaksono@nvidia.com/
> 
> Besar Wicaksono (2):
>   perf: arm_cspmu: Add support for ARM CoreSight PMU driver
>   perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
> 
>  Documentation/admin-guide/perf/index.rst      |    1 +
>  Documentation/admin-guide/perf/nvidia-pmu.rst |  120 ++
>  arch/arm64/configs/defconfig                  |    1 +
>  drivers/perf/Kconfig                          |    2 +
>  drivers/perf/Makefile                         |    1 +
>  drivers/perf/arm_cspmu/Kconfig                |   13 +
>  drivers/perf/arm_cspmu/Makefile               |    7 +
>  drivers/perf/arm_cspmu/arm_cspmu.c            | 1269 +++++++++++++++++
>  drivers/perf/arm_cspmu/arm_cspmu.h            |  151 ++
>  drivers/perf/arm_cspmu/nvidia_cspmu.c         |  367 +++++
>  drivers/perf/arm_cspmu/nvidia_cspmu.h         |   17 +
>  11 files changed, 1949 insertions(+)
>  create mode 100644 Documentation/admin-guide/perf/nvidia-pmu.rst
>  create mode 100644 drivers/perf/arm_cspmu/Kconfig
>  create mode 100644 drivers/perf/arm_cspmu/Makefile
>  create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.c
>  create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.h
>  create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.c
>  create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.h
> 
> 
> base-commit: 09ce5091ff971cdbfd67ad84dc561ea27f10d67a
> --
> 2.17.1


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver
  2022-08-14 18:23 ` [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
@ 2022-09-22 13:52   ` Will Deacon
  2022-09-27  3:59     ` Besar Wicaksono
  2022-09-28  8:31     ` Michael Williams (ATG)
  2022-09-27 11:39   ` Suzuki K Poulose
  1 sibling, 2 replies; 13+ messages in thread
From: Will Deacon @ 2022-09-22 13:52 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: suzuki.poulose, robin.murphy, catalin.marinas, mark.rutland,
	linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	mathieu.poirier, mike.leach, leo.yan

On Sun, Aug 14, 2022 at 01:23:50PM -0500, Besar Wicaksono wrote:
> Add support for ARM CoreSight PMU driver framework and interfaces.
> The driver provides generic implementation to operate uncore PMU based
> on ARM CoreSight PMU architecture. The driver also provides interface
> to get vendor/implementation specific information, for example event
> attributes and formating.
> 
> The specification used in this implementation can be found below:
>  * ACPI Arm Performance Monitoring Unit table:
>         https://developer.arm.com/documentation/den0117/latest
>  * ARM Coresight PMU architecture:
>         https://developer.arm.com/documentation/ihi0091/latest
> 
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
>  arch/arm64/configs/defconfig       |    1 +
>  drivers/perf/Kconfig               |    2 +
>  drivers/perf/Makefile              |    1 +
>  drivers/perf/arm_cspmu/Kconfig     |   13 +
>  drivers/perf/arm_cspmu/Makefile    |    6 +
>  drivers/perf/arm_cspmu/arm_cspmu.c | 1262 ++++++++++++++++++++++++++++
>  drivers/perf/arm_cspmu/arm_cspmu.h |  151 ++++
>  7 files changed, 1436 insertions(+)
>  create mode 100644 drivers/perf/arm_cspmu/Kconfig
>  create mode 100644 drivers/perf/arm_cspmu/Makefile
>  create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.c
>  create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.h

[...]

> diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
> new file mode 100644
> index 000000000000..410876f86eb0
> --- /dev/null
> +++ b/drivers/perf/arm_cspmu/arm_cspmu.c

[...]

> +/*
> + * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
> + */
> +static u64 read_reg64_hilohi(const void __iomem *addr)
> +{
> +	u32 val_lo, val_hi;
> +	u64 val;
> +
> +	/* Use high-low-high sequence to avoid tearing */
> +	do {
> +		val_hi = readl(addr + 4);
> +		val_lo = readl(addr);
> +	} while (val_hi != readl(addr + 4));

Hmm, we probably want a timeout or something in here so we don't lock
up the CPU if the device goes wonky.

With that, how about adding this a helper to
include/linux/io-64-nonatomic-*o.h so other folks can reuse it?

> +/* Check if PMU supports 64-bit single copy atomic. */
> +static inline bool supports_64bit_atomics(const struct arm_cspmu *cspmu)
> +{
> +	return CHECK_APMT_FLAG(cspmu->apmt_node->flags, ATOMIC, SUPP);
> +}

Is this just there because the architecture permits it, or are folks
actually hanging these things off 32-bit MMIO buses on arm64 SoCs?

> +static int arm_cspmu_request_irq(struct arm_cspmu *cspmu)
> +{
> +	int irq, ret;
> +	struct device *dev;
> +	struct platform_device *pdev;
> +	struct acpi_apmt_node *apmt_node;
> +
> +	dev = cspmu->dev;
> +	pdev = to_platform_device(dev);
> +	apmt_node = cspmu->apmt_node;
> +
> +	/* Skip IRQ request if the PMU does not support overflow interrupt. */
> +	if (apmt_node->ovflw_irq == 0)
> +		return 0;

Set PERF_PMU_CAP_NO_INTERRUPT?

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 0/2] perf: ARM CoreSight PMU support
  2022-08-23 17:24 ` [PATCH v4 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
@ 2022-09-22 13:54   ` Will Deacon
  0 siblings, 0 replies; 13+ messages in thread
From: Will Deacon @ 2022-09-22 13:54 UTC (permalink / raw)
  To: Besar Wicaksono
  Cc: suzuki.poulose, robin.murphy, catalin.marinas, mark.rutland,
	linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi, mathieu.poirier, mike.leach,
	leo.yan

On Tue, Aug 23, 2022 at 05:24:05PM +0000, Besar Wicaksono wrote:
> Hi Reviewers,
> 
> Could we get some comments for this patchset ?
> Or an estimate when this could get your review is greatly appreciated.

I'd like Suzuki's ack before I merge this, although it mostly looks alright
to me from a quick look.

Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver
  2022-09-22 13:52   ` Will Deacon
@ 2022-09-27  3:59     ` Besar Wicaksono
  2022-09-28  8:31     ` Michael Williams (ATG)
  1 sibling, 0 replies; 13+ messages in thread
From: Besar Wicaksono @ 2022-09-27  3:59 UTC (permalink / raw)
  To: Will Deacon
  Cc: suzuki.poulose, robin.murphy, catalin.marinas, mark.rutland,
	linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi, mathieu.poirier, mike.leach,
	leo.yan

Hi Will,

Thanks for the comment. Please see my response inline.

> -----Original Message-----
> From: Will Deacon <will@kernel.org>
> Sent: Thursday, September 22, 2022 8:53 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>
> Cc: suzuki.poulose@arm.com; robin.murphy@arm.com;
> catalin.marinas@arm.com; mark.rutland@arm.com; linux-arm-
> kernel@lists.infradead.org; linux-kernel@vger.kernel.org; linux-
> tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>; mathieu.poirier@linaro.org;
> mike.leach@linaro.org; leo.yan@linaro.org
> Subject: Re: [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM
> CoreSight PMU driver
> 
> External email: Use caution opening links or attachments
> 
> 
> On Sun, Aug 14, 2022 at 01:23:50PM -0500, Besar Wicaksono wrote:
> > Add support for ARM CoreSight PMU driver framework and interfaces.
> > The driver provides generic implementation to operate uncore PMU based
> > on ARM CoreSight PMU architecture. The driver also provides interface
> > to get vendor/implementation specific information, for example event
> > attributes and formating.
> >
> > The specification used in this implementation can be found below:
> >  * ACPI Arm Performance Monitoring Unit table:
> >         https://developer.arm.com/documentation/den0117/latest
> >  * ARM Coresight PMU architecture:
> >         https://developer.arm.com/documentation/ihi0091/latest
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> > ---
> >  arch/arm64/configs/defconfig       |    1 +
> >  drivers/perf/Kconfig               |    2 +
> >  drivers/perf/Makefile              |    1 +
> >  drivers/perf/arm_cspmu/Kconfig     |   13 +
> >  drivers/perf/arm_cspmu/Makefile    |    6 +
> >  drivers/perf/arm_cspmu/arm_cspmu.c | 1262
> ++++++++++++++++++++++++++++
> >  drivers/perf/arm_cspmu/arm_cspmu.h |  151 ++++
> >  7 files changed, 1436 insertions(+)
> >  create mode 100644 drivers/perf/arm_cspmu/Kconfig
> >  create mode 100644 drivers/perf/arm_cspmu/Makefile
> >  create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.c
> >  create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.h
> 
> [...]
> 
> > diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c
> b/drivers/perf/arm_cspmu/arm_cspmu.c
> > new file mode 100644
> > index 000000000000..410876f86eb0
> > --- /dev/null
> > +++ b/drivers/perf/arm_cspmu/arm_cspmu.c
> 
> [...]
> 
> > +/*
> > + * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
> > + */
> > +static u64 read_reg64_hilohi(const void __iomem *addr)
> > +{
> > +     u32 val_lo, val_hi;
> > +     u64 val;
> > +
> > +     /* Use high-low-high sequence to avoid tearing */
> > +     do {
> > +             val_hi = readl(addr + 4);
> > +             val_lo = readl(addr);
> > +     } while (val_hi != readl(addr + 4));
> 
> Hmm, we probably want a timeout or something in here so we don't lock
> up the CPU if the device goes wonky.
> 

This function is used to read the counter register. The perf driver APIs
(read, stop) that use this function do not return an error code. I am
not sure if we can just break the loop and return 0. Any suggestions ?
Is triggering a panic acceptable ?

> With that, how about adding this a helper to
> include/linux/io-64-nonatomic-*o.h so other folks can reuse it?
> 
> > +/* Check if PMU supports 64-bit single copy atomic. */
> > +static inline bool supports_64bit_atomics(const struct arm_cspmu
> *cspmu)
> > +{
> > +     return CHECK_APMT_FLAG(cspmu->apmt_node->flags, ATOMIC,
> SUPP);
> > +}
> 
> Is this just there because the architecture permits it, or are folks
> actually hanging these things off 32-bit MMIO buses on arm64 SoCs?
> 

Yes, the PMU spec permits a system that needs to break 64-bit access into
a pair of 32-bit accesses.

> > +static int arm_cspmu_request_irq(struct arm_cspmu *cspmu)
> > +{
> > +     int irq, ret;
> > +     struct device *dev;
> > +     struct platform_device *pdev;
> > +     struct acpi_apmt_node *apmt_node;
> > +
> > +     dev = cspmu->dev;
> > +     pdev = to_platform_device(dev);
> > +     apmt_node = cspmu->apmt_node;
> > +
> > +     /* Skip IRQ request if the PMU does not support overflow interrupt. */
> > +     if (apmt_node->ovflw_irq == 0)
> > +             return 0;
> 
> Set PERF_PMU_CAP_NO_INTERRUPT?
> 

Thanks, I will apply it on the next version.

> Will

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver
  2022-08-14 18:23 ` [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
  2022-09-22 13:52   ` Will Deacon
@ 2022-09-27 11:39   ` Suzuki K Poulose
  2022-09-28  1:27     ` Besar Wicaksono
  1 sibling, 1 reply; 13+ messages in thread
From: Suzuki K Poulose @ 2022-09-27 11:39 UTC (permalink / raw)
  To: Besar Wicaksono, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	mathieu.poirier, mike.leach, leo.yan

On 14/08/2022 19:23, Besar Wicaksono wrote:
> Add support for ARM CoreSight PMU driver framework and interfaces.
> The driver provides generic implementation to operate uncore PMU based
> on ARM CoreSight PMU architecture. The driver also provides interface
> to get vendor/implementation specific information, for example event
> attributes and formating.
> 
> The specification used in this implementation can be found below:
>   * ACPI Arm Performance Monitoring Unit table:
>          https://developer.arm.com/documentation/den0117/latest
>   * ARM Coresight PMU architecture:
>          https://developer.arm.com/documentation/ihi0091/latest
> 
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
>   arch/arm64/configs/defconfig       |    1 +
>   drivers/perf/Kconfig               |    2 +
>   drivers/perf/Makefile              |    1 +
>   drivers/perf/arm_cspmu/Kconfig     |   13 +
>   drivers/perf/arm_cspmu/Makefile    |    6 +
>   drivers/perf/arm_cspmu/arm_cspmu.c | 1262 ++++++++++++++++++++++++++++
>   drivers/perf/arm_cspmu/arm_cspmu.h |  151 ++++
>   7 files changed, 1436 insertions(+)
>   create mode 100644 drivers/perf/arm_cspmu/Kconfig
>   create mode 100644 drivers/perf/arm_cspmu/Makefile
>   create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.c
>   create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.h
> 
> diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> index 7d1105343bc2..ee31c9159a5b 100644
> --- a/arch/arm64/configs/defconfig
> +++ b/arch/arm64/configs/defconfig
> @@ -1212,6 +1212,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
>   CONFIG_PHY_TEGRA_XUSB=y
>   CONFIG_PHY_AM654_SERDES=m
>   CONFIG_PHY_J721E_WIZ=m
> +CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU=y
>   CONFIG_ARM_SMMU_V3_PMU=m
>   CONFIG_FSL_IMX8_DDR_PMU=m
>   CONFIG_QCOM_L2_PMU=y
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 1e2d69453771..c94d3601eb48 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
>   	  Enable perf support for Marvell DDR Performance monitoring
>   	  event on CN10K platform.
>   
> +source "drivers/perf/arm_cspmu/Kconfig"
> +
>   endmenu
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index 57a279c61df5..3bc9323f0965 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) += arm_dmc620_pmu.o
>   obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
>   obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
>   obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
> +obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
> diff --git a/drivers/perf/arm_cspmu/Kconfig b/drivers/perf/arm_cspmu/Kconfig
> new file mode 100644
> index 000000000000..c2c56ecafccb
> --- /dev/null
> +++ b/drivers/perf/arm_cspmu/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0
> +#
> +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> +
> +config ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
> +	tristate "ARM Coresight Architecture PMU"
> +	depends on ACPI
> +	depends on ACPI_APMT || COMPILE_TEST
> +	help
> +	  Provides support for performance monitoring unit (PMU) devices
> +	  based on ARM CoreSight PMU architecture. Note that this PMU
> +	  architecture does not have relationship with the ARM CoreSight
> +	  Self-Hosted Tracing.
> diff --git a/drivers/perf/arm_cspmu/Makefile b/drivers/perf/arm_cspmu/Makefile
> new file mode 100644
> index 000000000000..cdc3455f74d8
> --- /dev/null
> +++ b/drivers/perf/arm_cspmu/Makefile
> @@ -0,0 +1,6 @@
> +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> +#
> +# SPDX-License-Identifier: GPL-2.0
> +
> +obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += \
> +	arm_cspmu.o
> diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
> new file mode 100644
> index 000000000000..410876f86eb0
> --- /dev/null
> +++ b/drivers/perf/arm_cspmu/arm_cspmu.c
> @@ -0,0 +1,1262 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * ARM CoreSight Architecture PMU driver.
> + *
> + * This driver adds support for uncore PMU based on ARM CoreSight Performance
> + * Monitoring Unit Architecture. The PMU is accessible via MMIO registers and
> + * like other uncore PMUs, it does not support process specific events and
> + * cannot be used in sampling mode.
> + *
> + * This code is based on other uncore PMUs like ARM DSU PMU. It provides a
> + * generic implementation to operate the PMU according to CoreSight PMU
> + * architecture and ACPI ARM PMU table (APMT) documents below:
> + *   - ARM CoreSight PMU architecture document number: ARM IHI 0091 A.a-00bet0.
> + *   - APMT document number: ARM DEN0117.
> + *
> + * The user should refer to the vendor technical documentation to get details
> + * about the supported events.
> + *
> + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> + *
> + */
> +
> +#include <linux/acpi.h>
> +#include <linux/cacheinfo.h>
> +#include <linux/ctype.h>
> +#include <linux/interrupt.h>
> +#include <linux/io-64-nonatomic-lo-hi.h>
> +#include <linux/module.h>
> +#include <linux/perf_event.h>
> +#include <linux/platform_device.h>
> +#include <acpi/processor.h>
> +
> +#include "arm_cspmu.h"
> +
> +#define PMUNAME "arm_cspmu"
> +#define DRVNAME "arm-cs-arch-pmu"
> +
> +#define ARM_CSPMU_CPUMASK_ATTR(_name, _config)			\
> +	ARM_CSPMU_EXT_ATTR(_name, arm_cspmu_cpumask_show,	\
> +				(unsigned long)_config)
> +
> +/*
> + * CoreSight PMU Arch register offsets.
> + */
> +#define PMEVCNTR_LO					0x0
> +#define PMEVCNTR_HI					0x4
> +#define PMEVTYPER					0x400
> +#define PMCCFILTR					0x47C
> +#define PMEVFILTR					0xA00
> +#define PMCNTENSET					0xC00
> +#define PMCNTENCLR					0xC20
> +#define PMINTENSET					0xC40
> +#define PMINTENCLR					0xC60
> +#define PMOVSCLR					0xC80
> +#define PMOVSSET					0xCC0
> +#define PMCFGR						0xE00
> +#define PMCR						0xE04
> +#define PMIIDR						0xE08
> +
> +/* PMCFGR register field */
> +#define PMCFGR_NCG					GENMASK(31, 28)
> +#define PMCFGR_HDBG					BIT(24)
> +#define PMCFGR_TRO					BIT(23)
> +#define PMCFGR_SS					BIT(22)
> +#define PMCFGR_FZO					BIT(21)
> +#define PMCFGR_MSI					BIT(20)
> +#define PMCFGR_UEN					BIT(19)
> +#define PMCFGR_NA					BIT(17)
> +#define PMCFGR_EX					BIT(16)
> +#define PMCFGR_CCD					BIT(15)
> +#define PMCFGR_CC					BIT(14)
> +#define PMCFGR_SIZE					GENMASK(13, 8)
> +#define PMCFGR_N					GENMASK(7, 0)
> +
> +/* PMCR register field */
> +#define PMCR_TRO					BIT(11)
> +#define PMCR_HDBG					BIT(10)
> +#define PMCR_FZO					BIT(9)
> +#define PMCR_NA						BIT(8)
> +#define PMCR_DP						BIT(5)
> +#define PMCR_X						BIT(4)
> +#define PMCR_D						BIT(3)
> +#define PMCR_C						BIT(2)
> +#define PMCR_P						BIT(1)
> +#define PMCR_E						BIT(0)
> +
> +/* Each SET/CLR register supports up to 32 counters. */
> +#define ARM_CSPMU_SET_CLR_COUNTER_SHIFT		5
> +#define ARM_CSPMU_SET_CLR_COUNTER_NUM		\
> +	(1 << ARM_CSPMU_SET_CLR_COUNTER_SHIFT)
> +
> +/* The number of 32-bit SET/CLR register that can be supported. */
> +#define ARM_CSPMU_SET_CLR_MAX_NUM ((PMCNTENCLR - PMCNTENSET) / sizeof(u32))
> +
> +static_assert(
> +	(ARM_CSPMU_SET_CLR_MAX_NUM * ARM_CSPMU_SET_CLR_COUNTER_NUM) >=
> +	ARM_CSPMU_MAX_HW_CNTRS);
> +
> +/* Convert counter idx into SET/CLR register number. */
> +#define COUNTER_TO_SET_CLR_ID(idx)			\
> +	(idx >> ARM_CSPMU_SET_CLR_COUNTER_SHIFT)
> +
> +/* Convert counter idx into SET/CLR register bit. */
> +#define COUNTER_TO_SET_CLR_BIT(idx)			\
> +	(idx & (ARM_CSPMU_SET_CLR_COUNTER_NUM - 1))
> +
> +#define ARM_CSPMU_ACTIVE_CPU_MASK		0x0
> +#define ARM_CSPMU_ASSOCIATED_CPU_MASK		0x1
> +
> +/* Check if field f in flags is set with value v */
> +#define CHECK_APMT_FLAG(flags, f, v) \
> +	((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ## _ ## v))
> +
> +/* Check and use default if implementer doesn't provide attribute callback */
> +#define CHECK_DEFAULT_IMPL_OPS(ops, callback)			\
> +	do {							\
> +		if (!ops->callback)				\
> +			ops->callback = arm_cspmu_ ## callback;	\
> +	} while (0)
> +
> +static unsigned long arm_cspmu_cpuhp_state;
> +
> +/*
> + * In CoreSight PMU architecture, all of the MMIO registers are 32-bit except
> + * counter register. The counter register can be implemented as 32-bit or 64-bit
> + * register depending on the value of PMCFGR.SIZE field. For 64-bit access,
> + * single-copy 64-bit atomic support is implementation defined. APMT node flag
> + * is used to identify if the PMU supports 64-bit single copy atomic. If 64-bit
> + * single copy atomic is not supported, the driver treats the register as a pair
> + * of 32-bit register.
> + */
> +
> +/*
> + * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
> + */
> +static u64 read_reg64_hilohi(const void __iomem *addr)
> +{
> +	u32 val_lo, val_hi;
> +	u64 val;
> +
> +	/* Use high-low-high sequence to avoid tearing */
> +	do {
> +		val_hi = readl(addr + 4);
> +		val_lo = readl(addr);
> +	} while (val_hi != readl(addr + 4));
> +
> +	val = (((u64)val_hi << 32) | val_lo);
> +
> +	return val;
> +}
> +
> +/* Check if PMU supports 64-bit single copy atomic. */
> +static inline bool supports_64bit_atomics(const struct arm_cspmu *cspmu)
> +{
> +	return CHECK_APMT_FLAG(cspmu->apmt_node->flags, ATOMIC, SUPP);
> +}
> +
> +/* Check if cycle counter is supported. */
> +static inline bool supports_cycle_counter(const struct arm_cspmu *cspmu)
> +{
> +	return (cspmu->pmcfgr & PMCFGR_CC);
> +}
> +
> +/* Get counter size, which is (PMCFGR_SIZE + 1). */
> +static inline u32 counter_size(const struct arm_cspmu *cspmu)
> +{
> +	return FIELD_GET(PMCFGR_SIZE, cspmu->pmcfgr) + 1;
> +}
> +
> +/* Get counter mask. */
> +static inline u64 counter_mask(const struct arm_cspmu *cspmu)
> +{
> +	return GENMASK_ULL(counter_size(cspmu) - 1, 0);
> +}
> +
> +/* Check if counter is implemented as 64-bit register. */
> +static inline bool use_64b_counter_reg(const struct arm_cspmu *cspmu)
> +{
> +	return (counter_size(cspmu) > 32);
> +}
> +
> +ssize_t arm_cspmu_sysfs_event_show(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct dev_ext_attribute *eattr =
> +		container_of(attr, struct dev_ext_attribute, attr);
> +	return sysfs_emit(buf, "event=0x%llx\n",
> +			  (unsigned long long)eattr->var);
> +}
> +EXPORT_SYMBOL_GPL(arm_cspmu_sysfs_event_show);
> +
> +/* Default event list. */
> +static struct attribute *arm_cspmu_event_attrs[] = {
> +	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
> +	NULL,
> +};
> +
> +static struct attribute **
> +arm_cspmu_get_event_attrs(const struct arm_cspmu *cspmu)
> +{
> +	return arm_cspmu_event_attrs;
> +}
> +
> +static umode_t
> +arm_cspmu_event_attr_is_visible(struct kobject *kobj,
> +				struct attribute *attr, int unused)
> +{
> +	struct device *dev = kobj_to_dev(kobj);
> +	struct arm_cspmu *cspmu = to_arm_cspmu(dev_get_drvdata(dev));
> +	struct perf_pmu_events_attr *eattr;
> +
> +	eattr = container_of(attr, typeof(*eattr), attr.attr);
> +
> +	/* Hide cycle event if not supported */
> +	if (!supports_cycle_counter(cspmu) &&
> +	    eattr->id == ARM_CSPMU_EVT_CYCLES_DEFAULT)
> +		return 0;
> +
> +	return attr->mode;
> +}
> +
> +ssize_t arm_cspmu_sysfs_format_show(struct device *dev,
> +				struct device_attribute *attr,
> +				char *buf)
> +{
> +	struct dev_ext_attribute *eattr =
> +		container_of(attr, struct dev_ext_attribute, attr);
> +	return sysfs_emit(buf, "%s\n", (char *)eattr->var);
> +}
> +EXPORT_SYMBOL_GPL(arm_cspmu_sysfs_format_show);
> +
> +static struct attribute *arm_cspmu_format_attrs[] = {
> +	ARM_CSPMU_FORMAT_EVENT_ATTR,
> +	ARM_CSPMU_FORMAT_FILTER_ATTR,
> +	NULL,
> +};
> +
> +static struct attribute **
> +arm_cspmu_get_format_attrs(const struct arm_cspmu *cspmu)
> +{
> +	return arm_cspmu_format_attrs;
> +}
> +
> +static u32 arm_cspmu_event_type(const struct perf_event *event)
> +{
> +	return event->attr.config & ARM_CSPMU_EVENT_MASK;
> +}
> +
> +static bool arm_cspmu_is_cycle_counter_event(const struct perf_event *event)
> +{
> +	return (event->attr.config == ARM_CSPMU_EVT_CYCLES_DEFAULT);
> +}
> +
> +static u32 arm_cspmu_event_filter(const struct perf_event *event)
> +{
> +	return event->attr.config1 & ARM_CSPMU_FILTER_MASK;
> +}
> +
> +static ssize_t arm_cspmu_identifier_show(struct device *dev,
> +					 struct device_attribute *attr,
> +					 char *page)
> +{
> +	struct arm_cspmu *cspmu = to_arm_cspmu(dev_get_drvdata(dev));
> +
> +	return sysfs_emit(page, "%s\n", cspmu->identifier);
> +}
> +
> +static struct device_attribute arm_cspmu_identifier_attr =
> +	__ATTR(identifier, 0444, arm_cspmu_identifier_show, NULL);
> +
> +static struct attribute *arm_cspmu_identifier_attrs[] = {
> +	&arm_cspmu_identifier_attr.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group arm_cspmu_identifier_attr_group = {
> +	.attrs = arm_cspmu_identifier_attrs,
> +};
> +
> +static const char *arm_cspmu_get_identifier(const struct arm_cspmu *cspmu)
> +{
> +	const char *identifier =
> +		devm_kasprintf(cspmu->dev, GFP_KERNEL, "%x",
> +			       cspmu->impl.pmiidr);
> +	return identifier;
> +}
> +
> +static const char *arm_cspmu_type_str[ACPI_APMT_NODE_TYPE_COUNT] = {
> +	"mc",
> +	"smmu",
> +	"pcie",
> +	"acpi",
> +	"cache",
> +};
> +
> +static const char *arm_cspmu_get_name(const struct arm_cspmu *cspmu)
> +{
> +	struct device *dev;
> +	struct acpi_apmt_node *apmt_node;
> +	u8 pmu_type;
> +	char *name;
> +	char acpi_hid_string[ACPI_ID_LEN] = { 0 };
> +	static atomic_t pmu_idx[ACPI_APMT_NODE_TYPE_COUNT] = { 0 };
> +
> +	dev = cspmu->dev;
> +	apmt_node = cspmu->apmt_node;
> +	pmu_type = apmt_node->type;
> +
> +	if (pmu_type >= ACPI_APMT_NODE_TYPE_COUNT) {
> +		dev_err(dev, "unsupported PMU type-%u\n", pmu_type);
> +		return NULL;
> +	}
> +
> +	if (pmu_type == ACPI_APMT_NODE_TYPE_ACPI) {
> +		memcpy(acpi_hid_string,
> +			&apmt_node->inst_primary,
> +			sizeof(apmt_node->inst_primary));
> +		name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s_%s_%u", PMUNAME,
> +				      arm_cspmu_type_str[pmu_type],
> +				      acpi_hid_string,
> +				      apmt_node->inst_secondary);
> +	} else {
> +		name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s_%d", PMUNAME,
> +				      arm_cspmu_type_str[pmu_type],
> +				      atomic_fetch_inc(&pmu_idx[pmu_type]));
> +	}
> +
> +	return name;
> +}
> +
> +static ssize_t arm_cspmu_cpumask_show(struct device *dev,
> +				      struct device_attribute *attr,
> +				      char *buf)
> +{
> +	struct pmu *pmu = dev_get_drvdata(dev);
> +	struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
> +	struct dev_ext_attribute *eattr =
> +		container_of(attr, struct dev_ext_attribute, attr);
> +	unsigned long mask_id = (unsigned long)eattr->var;
> +	const cpumask_t *cpumask;
> +
> +	switch (mask_id) {
> +	case ARM_CSPMU_ACTIVE_CPU_MASK:
> +		cpumask = &cspmu->active_cpu;
> +		break;
> +	case ARM_CSPMU_ASSOCIATED_CPU_MASK:
> +		cpumask = &cspmu->associated_cpus;
> +		break;
> +	default:
> +		return 0;
> +	}
> +	return cpumap_print_to_pagebuf(true, buf, cpumask);
> +}
> +
> +static struct attribute *arm_cspmu_cpumask_attrs[] = {
> +	ARM_CSPMU_CPUMASK_ATTR(cpumask, ARM_CSPMU_ACTIVE_CPU_MASK),
> +	ARM_CSPMU_CPUMASK_ATTR(associated_cpus, ARM_CSPMU_ASSOCIATED_CPU_MASK),
> +	NULL,
> +};
> +
> +static struct attribute_group arm_cspmu_cpumask_attr_group = {
> +	.attrs = arm_cspmu_cpumask_attrs,
> +};
> +
> +struct impl_match {
> +	u32 pmiidr;
> +	u32 mask;
> +	int (*impl_init_ops)(struct arm_cspmu *cspmu);
> +};
> +
> +static const struct impl_match impl_match[] = {
> +	{}
> +};
> +
> +static int arm_cspmu_init_impl_ops(struct arm_cspmu *cspmu)
> +{
> +	int ret;
> +	struct acpi_apmt_node *apmt_node = cspmu->apmt_node;
> +	struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
> +	const struct impl_match *match = impl_match;
> +
> +	/*
> +	 * Get PMU implementer and product id from APMT node.
> +	 * If APMT node doesn't have implementer/product id, try get it
> +	 * from PMIIDR.
> +	 */
> +	cspmu->impl.pmiidr =
> +		(apmt_node->impl_id) ? apmt_node->impl_id :
> +				       readl(cspmu->base0 + PMIIDR);
> +
> +	/* Find implementer specific attribute ops. */
> +	for (; match->pmiidr; match++) {
> +		const u32 mask = match->mask;
> +
> +		if ((match->pmiidr & mask) == (cspmu->impl.pmiidr & mask)) {
> +			ret = match->impl_init_ops(cspmu);
> +			if (ret)
> +				return ret;
> +
> +			break;
> +		}
> +	}
> +
> +	/* Use default callbacks if implementer doesn't provide one. */
> +	CHECK_DEFAULT_IMPL_OPS(impl_ops, get_event_attrs);
> +	CHECK_DEFAULT_IMPL_OPS(impl_ops, get_format_attrs);
> +	CHECK_DEFAULT_IMPL_OPS(impl_ops, get_identifier);
> +	CHECK_DEFAULT_IMPL_OPS(impl_ops, get_name);
> +	CHECK_DEFAULT_IMPL_OPS(impl_ops, is_cycle_counter_event);
> +	CHECK_DEFAULT_IMPL_OPS(impl_ops, event_type);
> +	CHECK_DEFAULT_IMPL_OPS(impl_ops, event_filter);
> +	CHECK_DEFAULT_IMPL_OPS(impl_ops, event_attr_is_visible);
> +
> +	return 0;
> +}
> +
> +static struct attribute_group *
> +arm_cspmu_alloc_event_attr_group(struct arm_cspmu *cspmu)
> +{
> +	struct attribute_group *event_group;
> +	struct device *dev = cspmu->dev;
> +	const struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
> +
> +	event_group =
> +		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> +	if (!event_group)
> +		return NULL;
> +
> +	event_group->name = "events";
> +	event_group->attrs = impl_ops->get_event_attrs(cspmu);
> +	event_group->is_visible = impl_ops->event_attr_is_visible;
> +
> +	return event_group;
> +}
> +
> +static struct attribute_group *
> +arm_cspmu_alloc_format_attr_group(struct arm_cspmu *cspmu)
> +{
> +	struct attribute_group *format_group;
> +	struct device *dev = cspmu->dev;
> +
> +	format_group =
> +		devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> +	if (!format_group)
> +		return NULL;
> +
> +	format_group->name = "format";
> +	format_group->attrs = cspmu->impl.ops.get_format_attrs(cspmu);
> +
> +	return format_group;
> +}
> +
> +static struct attribute_group **
> +arm_cspmu_alloc_attr_group(struct arm_cspmu *cspmu)
> +{
> +	struct attribute_group **attr_groups = NULL;
> +	struct device *dev = cspmu->dev;
> +	const struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
> +	int ret;
> +
> +	ret = arm_cspmu_init_impl_ops(cspmu);
> +	if (ret)
> +		return NULL;
> +
> +	cspmu->identifier = impl_ops->get_identifier(cspmu);
> +	cspmu->name = impl_ops->get_name(cspmu);
> +
> +	if (!cspmu->identifier || !cspmu->name)
> +		return NULL;
> +
> +	attr_groups = devm_kcalloc(dev, 5, sizeof(struct attribute_group *),
> +				   GFP_KERNEL);
> +	if (!attr_groups)
> +		return NULL;
> +
> +	attr_groups[0] = arm_cspmu_alloc_event_attr_group(cspmu);
> +	attr_groups[1] = arm_cspmu_alloc_format_attr_group(cspmu);
> +	attr_groups[2] = &arm_cspmu_identifier_attr_group;
> +	attr_groups[3] = &arm_cspmu_cpumask_attr_group;
> +
> +	if (!attr_groups[0] || !attr_groups[1])
> +		return NULL;
> +
> +	return attr_groups;
> +}
> +
> +static inline void arm_cspmu_reset_counters(struct arm_cspmu *cspmu)
> +{
> +	u32 pmcr = 0;
> +
> +	pmcr |= PMCR_P;
> +	pmcr |= PMCR_C;
> +	writel(pmcr, cspmu->base0 + PMCR);
> +}
> +
> +static inline void arm_cspmu_start_counters(struct arm_cspmu *cspmu)
> +{
> +	writel(PMCR_E, cspmu->base0 + PMCR);
> +}
> +
> +static inline void arm_cspmu_stop_counters(struct arm_cspmu *cspmu)
> +{
> +	writel(0, cspmu->base0 + PMCR);
> +}
> +
> +static void arm_cspmu_enable(struct pmu *pmu)
> +{
> +	bool disabled;
> +	struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
> +
> +	disabled = bitmap_empty(cspmu->hw_events.used_ctrs,
> +				cspmu->num_logical_ctrs);
> +
> +	if (disabled)
> +		return;
> +
> +	arm_cspmu_start_counters(cspmu);
> +}
> +
> +static void arm_cspmu_disable(struct pmu *pmu)
> +{
> +	struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
> +
> +	arm_cspmu_stop_counters(cspmu);
> +}
> +
> +static int arm_cspmu_get_event_idx(struct arm_cspmu_hw_events *hw_events,
> +				struct perf_event *event)
> +{
> +	int idx;
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +
> +	if (supports_cycle_counter(cspmu)) {
> +		if (cspmu->impl.ops.is_cycle_counter_event(event)) {
> +			/* Search for available cycle counter. */
> +			if (test_and_set_bit(cspmu->cycle_counter_logical_idx,
> +					     hw_events->used_ctrs))
> +				return -EAGAIN;
> +
> +			return cspmu->cycle_counter_logical_idx;
> +		}
> +
> +		/*
> +		 * Search a regular counter from the used counter bitmap.
> +		 * The cycle counter divides the bitmap into two parts. Search
> +		 * the first then second half to exclude the cycle counter bit.
> +		 */
> +		idx = find_first_zero_bit(hw_events->used_ctrs,
> +					  cspmu->cycle_counter_logical_idx);
> +		if (idx >= cspmu->cycle_counter_logical_idx) {
> +			idx = find_next_zero_bit(
> +				hw_events->used_ctrs,
> +				cspmu->num_logical_ctrs,
> +				cspmu->cycle_counter_logical_idx + 1);
> +		}
> +	} else {
> +		idx = find_first_zero_bit(hw_events->used_ctrs,
> +					  cspmu->num_logical_ctrs);
> +	}
> +
> +	if (idx >= cspmu->num_logical_ctrs)
> +		return -EAGAIN;
> +
> +	set_bit(idx, hw_events->used_ctrs);
> +
> +	return idx;
> +}
> +
> +static bool arm_cspmu_validate_event(struct pmu *pmu,
> +				 struct arm_cspmu_hw_events *hw_events,
> +				 struct perf_event *event)
> +{
> +	if (is_software_event(event))
> +		return true;
> +
> +	/* Reject groups spanning multiple HW PMUs. */
> +	if (event->pmu != pmu)
> +		return false;
> +
> +	return (arm_cspmu_get_event_idx(hw_events, event) >= 0);
> +}
> +
> +/*
> + * Make sure the group of events can be scheduled at once
> + * on the PMU.
> + */
> +static bool arm_cspmu_validate_group(struct perf_event *event)
> +{
> +	struct perf_event *sibling, *leader = event->group_leader;
> +	struct arm_cspmu_hw_events fake_hw_events;
> +
> +	if (event->group_leader == event)
> +		return true;
> +
> +	memset(&fake_hw_events, 0, sizeof(fake_hw_events));
> +
> +	if (!arm_cspmu_validate_event(event->pmu, &fake_hw_events, leader))
> +		return false;
> +
> +	for_each_sibling_event(sibling, leader) {
> +		if (!arm_cspmu_validate_event(event->pmu, &fake_hw_events,
> +						  sibling))
> +			return false;
> +	}
> +
> +	return arm_cspmu_validate_event(event->pmu, &fake_hw_events, event);
> +}
> +
> +static int arm_cspmu_event_init(struct perf_event *event)
> +{
> +	struct arm_cspmu *cspmu;
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	cspmu = to_arm_cspmu(event->pmu);
> +
> +	/*
> +	 * Following other "uncore" PMUs, we do not support sampling mode or
> +	 * attach to a task (per-process mode).
> +	 */
> +	if (is_sampling_event(event)) {
> +		dev_dbg(cspmu->pmu.dev,
> +			"Can't support sampling events\n");
> +		return -EOPNOTSUPP;
> +	}
> +
> +	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
> +		dev_dbg(cspmu->pmu.dev,
> +			"Can't support per-task counters\n");
> +		return -EINVAL;
> +	}
> +
> +	/*
> +	 * Make sure the CPU assignment is on one of the CPUs associated with
> +	 * this PMU.
> +	 */
> +	if (!cpumask_test_cpu(event->cpu, &cspmu->associated_cpus)) {
> +		dev_dbg(cspmu->pmu.dev,
> +			"Requested cpu is not associated with the PMU\n");
> +		return -EINVAL;
> +	}
> +
> +	/* Enforce the current active CPU to handle the events in this PMU. */
> +	event->cpu = cpumask_first(&cspmu->active_cpu);
> +	if (event->cpu >= nr_cpu_ids)
> +		return -EINVAL;
> +
> +	if (!arm_cspmu_validate_group(event))
> +		return -EINVAL;
> +
> +	/*
> +	 * The logical counter id is tracked with hw_perf_event.extra_reg.idx.
> +	 * The physical counter id is tracked with hw_perf_event.idx.
> +	 * We don't assign an index until we actually place the event onto
> +	 * hardware. Use -1 to signify that we haven't decided where to put it
> +	 * yet.
> +	 */
> +	hwc->idx = -1;
> +	hwc->extra_reg.idx = -1;
> +	hwc->config = cspmu->impl.ops.event_type(event);
> +
> +	return 0;
> +}
> +
> +static inline u32 counter_offset(u32 reg_sz, u32 ctr_idx)
> +{
> +	return (PMEVCNTR_LO + (reg_sz * ctr_idx));
> +}
> +
> +static void arm_cspmu_write_counter(struct perf_event *event, u64 val)
> +{
> +	u32 offset;
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +
> +	if (use_64b_counter_reg(cspmu)) {
> +		offset = counter_offset(sizeof(u64), event->hw.idx);
> +
> +		writeq(val, cspmu->base1 + offset);
> +	} else {
> +		offset = counter_offset(sizeof(u32), event->hw.idx);
> +
> +		writel(lower_32_bits(val), cspmu->base1 + offset);
> +	}
> +}
> +
> +static u64 arm_cspmu_read_counter(struct perf_event *event)
> +{
> +	u32 offset;
> +	const void __iomem *counter_addr;
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +
> +	if (use_64b_counter_reg(cspmu)) {
> +		offset = counter_offset(sizeof(u64), event->hw.idx);
> +		counter_addr = cspmu->base1 + offset;
> +
> +		return supports_64bit_atomics(cspmu) ?
> +			       readq(counter_addr) :
> +			       read_reg64_hilohi(counter_addr);
> +	}
> +
> +	offset = counter_offset(sizeof(u32), event->hw.idx);
> +	return readl(cspmu->base1 + offset);
> +}
> +
> +/*
> + * arm_cspmu_set_event_period: Set the period for the counter.
> + *
> + * To handle cases of extreme interrupt latency, we program
> + * the counter with half of the max count for the counters.
> + */
> +static void arm_cspmu_set_event_period(struct perf_event *event)
> +{
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +	u64 val = counter_mask(cspmu) >> 1ULL;
> +
> +	local64_set(&event->hw.prev_count, val);
> +	arm_cspmu_write_counter(event, val);
> +}
> +
> +static void arm_cspmu_enable_counter(struct arm_cspmu *cspmu, int idx)
> +{
> +	u32 reg_id, reg_bit, inten_off, cnten_off;
> +
> +	reg_id = COUNTER_TO_SET_CLR_ID(idx);
> +	reg_bit = COUNTER_TO_SET_CLR_BIT(idx);
> +
> +	inten_off = PMINTENSET + (4 * reg_id);
> +	cnten_off = PMCNTENSET + (4 * reg_id);
> +
> +	writel(BIT(reg_bit), cspmu->base0 + inten_off);
> +	writel(BIT(reg_bit), cspmu->base0 + cnten_off);
> +}
> +
> +static void arm_cspmu_disable_counter(struct arm_cspmu *cspmu, int idx)
> +{
> +	u32 reg_id, reg_bit, inten_off, cnten_off;
> +
> +	reg_id = COUNTER_TO_SET_CLR_ID(idx);
> +	reg_bit = COUNTER_TO_SET_CLR_BIT(idx);
> +
> +	inten_off = PMINTENCLR + (4 * reg_id);
> +	cnten_off = PMCNTENCLR + (4 * reg_id);
> +
> +	writel(BIT(reg_bit), cspmu->base0 + cnten_off);
> +	writel(BIT(reg_bit), cspmu->base0 + inten_off);
> +}
> +
> +static void arm_cspmu_event_update(struct perf_event *event)
> +{
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +	u64 delta, prev, now;
> +
> +	do {
> +		prev = local64_read(&hwc->prev_count);
> +		now = arm_cspmu_read_counter(event);
> +	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
> +
> +	delta = (now - prev) & counter_mask(cspmu);
> +	local64_add(delta, &event->count);
> +}
> +
> +static inline void arm_cspmu_set_event(struct arm_cspmu *cspmu,
> +					struct hw_perf_event *hwc)
> +{
> +	u32 offset = PMEVTYPER + (4 * hwc->idx);
> +
> +	writel(hwc->config, cspmu->base0 + offset);
> +}
> +
> +static inline void arm_cspmu_set_ev_filter(struct arm_cspmu *cspmu,
> +					   struct hw_perf_event *hwc,
> +					   u32 filter)
> +{
> +	u32 offset = PMEVFILTR + (4 * hwc->idx);
> +
> +	writel(filter, cspmu->base0 + offset);
> +}
> +
> +static inline void arm_cspmu_set_cc_filter(struct arm_cspmu *cspmu, u32 filter)
> +{
> +	u32 offset = PMCCFILTR;
> +
> +	writel(filter, cspmu->base0 + offset);
> +}
> +
> +static void arm_cspmu_start(struct perf_event *event, int pmu_flags)
> +{
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +	u32 filter;
> +
> +	/* We always reprogram the counter */
> +	if (pmu_flags & PERF_EF_RELOAD)
> +		WARN_ON(!(hwc->state & PERF_HES_UPTODATE));
> +
> +	arm_cspmu_set_event_period(event);
> +
> +	filter = cspmu->impl.ops.event_filter(event);
> +
> +	if (event->hw.extra_reg.idx == cspmu->cycle_counter_logical_idx) {
> +		arm_cspmu_set_cc_filter(cspmu, filter);
> +	} else {
> +		arm_cspmu_set_event(cspmu, hwc);
> +		arm_cspmu_set_ev_filter(cspmu, hwc, filter);
> +	}
> +
> +	hwc->state = 0;
> +
> +	arm_cspmu_enable_counter(cspmu, hwc->idx);
> +}
> +
> +static void arm_cspmu_stop(struct perf_event *event, int pmu_flags)
> +{
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (hwc->state & PERF_HES_STOPPED)
> +		return;
> +
> +	arm_cspmu_disable_counter(cspmu, hwc->idx);
> +	arm_cspmu_event_update(event);
> +
> +	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
> +}
> +
> +static inline u32 to_phys_idx(struct arm_cspmu *cspmu, u32 idx)
> +{
> +	return (idx == cspmu->cycle_counter_logical_idx) ?
> +		ARM_CSPMU_CYCLE_CNTR_IDX : idx;
> +}
> +
> +static int arm_cspmu_add(struct perf_event *event, int flags)
> +{
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +	struct arm_cspmu_hw_events *hw_events = &cspmu->hw_events;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx;
> +
> +	if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> +					   &cspmu->associated_cpus)))
> +		return -ENOENT;
> +
> +	idx = arm_cspmu_get_event_idx(hw_events, event);
> +	if (idx < 0)
> +		return idx;
> +
> +	hw_events->events[idx] = event;
> +	hwc->idx = to_phys_idx(cspmu, idx);
> +	hwc->extra_reg.idx = idx;
> +	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> +
> +	if (flags & PERF_EF_START)
> +		arm_cspmu_start(event, PERF_EF_RELOAD);
> +
> +	/* Propagate changes to the userspace mapping. */
> +	perf_event_update_userpage(event);
> +
> +	return 0;
> +}
> +
> +static void arm_cspmu_del(struct perf_event *event, int flags)
> +{
> +	struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> +	struct arm_cspmu_hw_events *hw_events = &cspmu->hw_events;
> +	struct hw_perf_event *hwc = &event->hw;
> +	int idx = hwc->extra_reg.idx;
> +
> +	arm_cspmu_stop(event, PERF_EF_UPDATE);
> +
> +	hw_events->events[idx] = NULL;
> +
> +	clear_bit(idx, hw_events->used_ctrs);
> +
> +	perf_event_update_userpage(event);
> +}
> +
> +static void arm_cspmu_read(struct perf_event *event)
> +{
> +	arm_cspmu_event_update(event);
> +}
> +
> +static struct arm_cspmu *arm_cspmu_alloc(struct platform_device *pdev)
> +{
> +	struct acpi_apmt_node *apmt_node;
> +	struct arm_cspmu *cspmu;
> +	struct device *dev;
> +
> +	dev = &pdev->dev;
> +	apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
> +	if (!apmt_node) {
> +		dev_err(dev, "failed to get APMT node\n");
> +		return NULL;
> +	}
> +
> +	cspmu = devm_kzalloc(dev, sizeof(*cspmu), GFP_KERNEL);
> +	if (!cspmu)
> +		return NULL;
> +
> +	cspmu->dev = dev;
> +	cspmu->apmt_node = apmt_node;
> +
> +	platform_set_drvdata(pdev, cspmu);
> +
> +	return cspmu;
> +}
> +
> +static int arm_cspmu_init_mmio(struct arm_cspmu *cspmu)
> +{
> +	struct device *dev;
> +	struct platform_device *pdev;
> +	struct acpi_apmt_node *apmt_node;
> +
> +	dev = cspmu->dev;
> +	pdev = to_platform_device(dev);
> +	apmt_node = cspmu->apmt_node;
> +
> +	/* Base address for page 0. */
> +	cspmu->base0 = devm_platform_ioremap_resource(pdev, 0);
> +	if (IS_ERR(cspmu->base0)) {
> +		dev_err(dev, "ioremap failed for page-0 resource\n");
> +		return PTR_ERR(cspmu->base0);
> +	}
> +
> +	/* Base address for page 1 if supported. Otherwise point to page 0. */
> +	cspmu->base1 = cspmu->base0;
> +	if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
> +		cspmu->base1 = devm_platform_ioremap_resource(pdev, 1);
> +		if (IS_ERR(cspmu->base1)) {
> +			dev_err(dev, "ioremap failed for page-1 resource\n");
> +			return PTR_ERR(cspmu->base1);
> +		}
> +	}
> +
> +	cspmu->pmcfgr = readl(cspmu->base0 + PMCFGR);
> +
> +	cspmu->num_logical_ctrs = FIELD_GET(PMCFGR_N, cspmu->pmcfgr) + 1;
> +
> +	cspmu->cycle_counter_logical_idx = ARM_CSPMU_MAX_HW_CNTRS;
> +
> +	if (supports_cycle_counter(cspmu)) {
> +		/*
> +		 * The last logical counter is mapped to cycle counter if
> +		 * there is a gap between regular and cycle counter. Otherwise,
> +		 * logical and physical have 1-to-1 mapping.
> +		 */
> +		cspmu->cycle_counter_logical_idx =
> +			(cspmu->num_logical_ctrs <= ARM_CSPMU_CYCLE_CNTR_IDX) ?
> +				cspmu->num_logical_ctrs - 1 :
> +				ARM_CSPMU_CYCLE_CNTR_IDX;
> +	}
> +
> +	cspmu->num_set_clr_reg =
> +		DIV_ROUND_UP(cspmu->num_logical_ctrs,
> +				ARM_CSPMU_SET_CLR_COUNTER_NUM);
> +
> +	cspmu->hw_events.events =
> +		devm_kcalloc(dev, cspmu->num_logical_ctrs,
> +			     sizeof(*cspmu->hw_events.events), GFP_KERNEL);
> +
> +	if (!cspmu->hw_events.events)
> +		return -ENOMEM;
> +
> +	return 0;
> +}
> +
> +static inline int arm_cspmu_get_reset_overflow(struct arm_cspmu *cspmu,
> +					       u32 *pmovs)
> +{
> +	int i;
> +	u32 pmovclr_offset = PMOVSCLR;
> +	u32 has_overflowed = 0;
> +
> +	for (i = 0; i < cspmu->num_set_clr_reg; ++i) {
> +		pmovs[i] = readl(cspmu->base1 + pmovclr_offset);
> +		has_overflowed |= pmovs[i];
> +		writel(pmovs[i], cspmu->base1 + pmovclr_offset);
> +		pmovclr_offset += sizeof(u32);
> +	}
> +
> +	return has_overflowed != 0;
> +}
> +
> +static irqreturn_t arm_cspmu_handle_irq(int irq_num, void *dev)
> +{
> +	int idx, has_overflowed;
> +	struct perf_event *event;
> +	struct arm_cspmu *cspmu = dev;
> +	u32 pmovs[ARM_CSPMU_SET_CLR_MAX_NUM] = { 0 };

nit: Could we not reuse what we do for hw_events.use_ctrs ?

i.e, DECLARE_BITMAP(pmovs, ARM_CSPMU_MAX_HW_CNTRS)


And remove ARM_CSPMU_SET_CLR_MAX_NUM altogether and the cast below
to (unsigned long *).

With that

Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Suzuki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
  2022-08-14 18:23 ` [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
@ 2022-09-27 11:42   ` Suzuki K Poulose
  2022-09-28  1:38     ` Besar Wicaksono
  0 siblings, 1 reply; 13+ messages in thread
From: Suzuki K Poulose @ 2022-09-27 11:42 UTC (permalink / raw)
  To: Besar Wicaksono, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, treding, jonathanh, vsethi,
	mathieu.poirier, mike.leach, leo.yan

On 14/08/2022 19:23, Besar Wicaksono wrote:
> Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
> Fabric (MCF) PMU attributes for CoreSight PMU implementation in
> NVIDIA devices.
> 
> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> ---
>   Documentation/admin-guide/perf/index.rst      |   1 +
>   Documentation/admin-guide/perf/nvidia-pmu.rst | 120 ++++++
>   drivers/perf/arm_cspmu/Makefile               |   3 +-
>   drivers/perf/arm_cspmu/arm_cspmu.c            |   7 +
>   drivers/perf/arm_cspmu/nvidia_cspmu.c         | 367 ++++++++++++++++++
>   drivers/perf/arm_cspmu/nvidia_cspmu.h         |  17 +
>   6 files changed, 514 insertions(+), 1 deletion(-)
>   create mode 100644 Documentation/admin-guide/perf/nvidia-pmu.rst
>   create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.c
>   create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.h
> 
> diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
> index 69b23f087c05..cf05fed1f67f 100644
> --- a/Documentation/admin-guide/perf/index.rst
> +++ b/Documentation/admin-guide/perf/index.rst
> @@ -17,3 +17,4 @@ Performance monitor support
>      xgene-pmu
>      arm_dsu_pmu
>      thunderx2-pmu
> +   nvidia-pmu
> diff --git a/Documentation/admin-guide/perf/nvidia-pmu.rst b/Documentation/admin-guide/perf/nvidia-pmu.rst
> new file mode 100644
> index 000000000000..c41b93965824
> --- /dev/null
> +++ b/Documentation/admin-guide/perf/nvidia-pmu.rst
> @@ -0,0 +1,120 @@
> +=========================================================
> +NVIDIA Tegra SoC Uncore Performance Monitoring Unit (PMU)
> +=========================================================
> +
> +The NVIDIA Tegra SoC includes various system PMUs to measure key performance
> +metrics like memory bandwidth, latency, and utilization:
> +
> +* Scalable Coherency Fabric (SCF)
> +* Memory Controller Fabric (MCF) GPU physical interface
> +* MCF GPU virtual interface
> +* MCF NVLINK interface
> +* MCF PCIE interface
> +
> +PMU Driver
> +----------
> +
> +The PMUs in this document are based on ARM CoreSight PMU Architecture as
> +described in document: ARM IHI 0091. Since this is a standard architecture, the
> +PMUs are managed by a common driver "arm-cs-arch-pmu". This driver describes
> +the available events and configuration of each PMU in sysfs. Please see the
> +sections below to get the sysfs path of each PMU. Like other uncore PMU driver,
> +the driver provides "cpumask" sysfs attribute to show the CPU id used to handle
> +the PMU event. There is also "associated_cpus" sysfs attribute, which contains a
> +list of CPUs associated with the PMU instance.
> +
> +SCF PMU
> +-------
> +
> +The SCF PMU monitors system level cache events, CPU traffic, and
> +strongly-ordered PCIE traffic to local/remote memory.
> +
> +The events and configuration options of this PMU device are described in sysfs,
> +see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
> +
> +Example usage::
> +
> +  perf stat -a -e nvidia_scf_pmu_0/config=0x0/
> +
> +This will count the events in socket 0.
> +
> +MCF GPU Physical PMU
> +--------------------
> +
> +The MCF GPU physical PMU monitors ATS translated traffic from GPU to
> +local/remote memory via Nvlink C2C.
> +
> +The events and configuration options of this PMU device are described in sysfs,
> +see /sys/bus/event_sources/devices/nvidia_mcf_gpu_pmu_<socket-id>.
> +
> +Multiple GPUs can be connected to the SoC. The user can use "gpu" bitmap
> +parameter to select the GPU(s) to monitor, i.e. "gpu=0xF" corresponds to GPU 0
> +to 3. /sys/bus/event_sources/devices/nvidia_mcf_gpu_pmu_<socket-id>/format/gpu
> +shows the valid bits that can be set in the "gpu" parameter.
> +
> +Example usage::
> +
> +  perf stat -a -e nvidia_mcf_gpu_pmu_0/config=0x0,gpu=0x3/
> +
> +This will count the events on GPU 0 and 1 that are connected to SoC in socket 0.
> +
> +MCF GPU Virtual PMU
> +-------------------
> +
> +The MCF GPU virtual PMU monitors SMMU inline translated traffic (as opposed to
> +ATS) from GPU to local/remote memory via Nvlink C2C.
> +
> +The events and configuration options of this PMU device are described in sysfs,
> +see /sys/bus/event_sources/devices/nvidia_mcf_gpuvir_pmu_<socket-id>.
> +
> +Multiple GPUs can be connected to the SoC. The user can use "gpu" bitmap
> +parameter to select the GPU(s) to monitor, i.e. "gpu=0xF" corresponds to GPU 0
> +to 3. /sys/bus/event_sources/devices/nvidia_mcf_gpuvir_pmu_<socket-id>/format/gpu
> +shows the valid bits that can be set in the "gpu" parameter.
> +
> +Example usage::
> +
> +  perf stat -a -e nvidia_mcf_gpuvir_pmu_0/config=0x0,gpu=0x3/
> +
> +This will count the events on GPU 0 and 1 that are connected to SoC in socket 0.
> +
> +MCF NVLINK PMU
> +--------------
> +
> +The MCF NVLINK PMU monitors I/O coherent traffic from external socket to local
> +memory.
> +
> +The events and configuration options of this PMU device are described in sysfs,
> +see /sys/bus/event_sources/devices/nvidia_mcf_nvlink_pmu_<socket-id>.
> +
> +Each SoC socket can be connected to one or more sockets via NVLINK. The user can
> +use "rem_socket" bitmap parameter to select the remote socket(s) to monitor,
> +i.e. "rem_socket=0xE" corresponds to socket 1 to 3.
> +/sys/bus/event_sources/devices/nvidia_mcf_nvlink_pmu_<socket-id>/format/rem_socket
> +shows the valid bits that can be set in the "rem_socket" parameter.
> +
> +Example usage::
> +
> +  perf stat -a -e nvidia_mcf_nvlink_pmu_0/config=0x0,rem_socket=0x6/
> +
> +This will count the events from remote socket 1 and 2 to socket 0.
> +
> +MCF PCIE PMU
> +------------
> +
> +The MCF PCIE PMU monitors traffic from PCIE root ports to local/remote memory.
> +
> +The events and configuration options of this PMU device are described in sysfs,
> +see /sys/bus/event_sources/devices/nvidia_mcf_pcie_pmu_<socket-id>.
> +
> +Each SoC socket can support multiple root ports. The user can use
> +"root_port" bitmap parameter to select the port(s) to monitor, i.e.
> +"root_port=0xF" corresponds to root port 0 to 3.
> +/sys/bus/event_sources/devices/nvidia_mcf_pcie_pmu_<socket-id>/format/root_port
> +shows the valid bits that can be set in the "root_port" parameter.
> +
> +Example usage::
> +
> +  perf stat -a -e nvidia_mcf_pcie_pmu_0/config=0x0,root_port=0x3/
> +
> +This will count the events from root port 0 and 1 of socket 0.
> diff --git a/drivers/perf/arm_cspmu/Makefile b/drivers/perf/arm_cspmu/Makefile
> index cdc3455f74d8..1b586064bd77 100644
> --- a/drivers/perf/arm_cspmu/Makefile
> +++ b/drivers/perf/arm_cspmu/Makefile
> @@ -3,4 +3,5 @@
>   # SPDX-License-Identifier: GPL-2.0
>   
>   obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += \
> -	arm_cspmu.o
> +	arm_cspmu.o \
> +	nvidia_cspmu.o
> diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c b/drivers/perf/arm_cspmu/arm_cspmu.c
> index 410876f86eb0..7a0beb515e53 100644
> --- a/drivers/perf/arm_cspmu/arm_cspmu.c
> +++ b/drivers/perf/arm_cspmu/arm_cspmu.c
> @@ -31,6 +31,7 @@
>   #include <acpi/processor.h>
>   
>   #include "arm_cspmu.h"
> +#include "nvidia_cspmu.h"
>   
>   #define PMUNAME "arm_cspmu"
>   #define DRVNAME "arm-cs-arch-pmu"
> @@ -118,6 +119,9 @@ static_assert(
>   			ops->callback = arm_cspmu_ ## callback;	\
>   	} while (0)
>   
> +/* JEDEC-assigned JEP106 identification code */
> +#define ARM_CSPMU_IMPL_ID_NVIDIA		0x36B
> +
>   static unsigned long arm_cspmu_cpuhp_state;
>   
>   /*
> @@ -369,6 +373,9 @@ struct impl_match {
>   };
>   
>   static const struct impl_match impl_match[] = {
> +	{ .pmiidr = ARM_CSPMU_IMPL_ID_NVIDIA,
> +	  .mask = ARM_CSPMU_PMIIDR_IMPLEMENTER,
> +	  .impl_init_ops = nv_cspmu_init_ops },

Super minor nit: Coding style. Could we use :

	{
		.field = value,
		...
	},

>   	{}
>   };
>   
> diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> new file mode 100644
> index 000000000000..261f20680bc1
> --- /dev/null
> +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> @@ -0,0 +1,367 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> + *
> + */
> +
> +/* Support for NVIDIA specific attributes. */
> +
> +#include "nvidia_cspmu.h"
> +
> +#define NV_MCF_PCIE_PORT_COUNT       10ULL
> +#define NV_MCF_PCIE_FILTER_ID_MASK   GENMASK_ULL(NV_MCF_PCIE_PORT_COUNT - 1, 0)
> +
> +#define NV_MCF_GPU_PORT_COUNT        2ULL
> +#define NV_MCF_GPU_FILTER_ID_MASK    GENMASK_ULL(NV_MCF_GPU_PORT_COUNT - 1, 0)
> +
> +#define NV_MCF_NVL_PORT_COUNT        4ULL
> +#define NV_MCF_NVL_FILTER_ID_MASK    GENMASK_ULL(NV_MCF_NVL_PORT_COUNT - 1, 0)
> +
> +#define NV_SCF_MCF_PRODID_MASK       GENMASK(31, 0)
> +
> +#define NV_FORMAT_NAME_GENERIC	0
> +
> +#define to_nv_cspmu_ctx(cspmu)	((struct nv_cspmu_ctx *)(cspmu->impl.ctx))
> +
> +#define NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _num, _suff, _config)	\
> +	ARM_CSPMU_EVENT_ATTR(_pref##_num##_suff, _config)
> +
> +#define NV_CSPMU_EVENT_ATTR_4(_pref, _suff, _config)			\
> +	NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _0_, _suff, _config),	\
> +	NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _1_, _suff, _config + 1),	\
> +	NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _2_, _suff, _config + 2),	\
> +	NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _3_, _suff, _config + 3)
> +
> +struct nv_cspmu_ctx {
> +	const char *name;
> +	u32 filter_mask;
> +	struct attribute **event_attr;
> +	struct attribute **format_attr;
> +};
> +
> +static struct attribute *scf_pmu_event_attrs[] = {
> +	ARM_CSPMU_EVENT_ATTR(bus_cycles,			0x1d),
> +
> +	ARM_CSPMU_EVENT_ATTR(scf_cache_allocate,		0xF0),
> +	ARM_CSPMU_EVENT_ATTR(scf_cache_refill,			0xF1),
> +	ARM_CSPMU_EVENT_ATTR(scf_cache,				0xF2),
> +	ARM_CSPMU_EVENT_ATTR(scf_cache_wb,			0xF3),
> +
> +	NV_CSPMU_EVENT_ATTR_4(socket, rd_data,			0x101),
> +	NV_CSPMU_EVENT_ATTR_4(socket, dl_rsp,			0x105),
> +	NV_CSPMU_EVENT_ATTR_4(socket, wb_data,			0x109),
> +	NV_CSPMU_EVENT_ATTR_4(socket, ev_rsp,			0x10d),
> +	NV_CSPMU_EVENT_ATTR_4(socket, prb_data,			0x111),
> +
> +	NV_CSPMU_EVENT_ATTR_4(socket, rd_outstanding,		0x115),
> +	NV_CSPMU_EVENT_ATTR_4(socket, dl_outstanding,		0x119),
> +	NV_CSPMU_EVENT_ATTR_4(socket, wb_outstanding,		0x11d),
> +	NV_CSPMU_EVENT_ATTR_4(socket, wr_outstanding,		0x121),
> +	NV_CSPMU_EVENT_ATTR_4(socket, ev_outstanding,		0x125),
> +	NV_CSPMU_EVENT_ATTR_4(socket, prb_outstanding,		0x129),
> +
> +	NV_CSPMU_EVENT_ATTR_4(socket, rd_access,		0x12d),
> +	NV_CSPMU_EVENT_ATTR_4(socket, dl_access,		0x131),
> +	NV_CSPMU_EVENT_ATTR_4(socket, wb_access,		0x135),
> +	NV_CSPMU_EVENT_ATTR_4(socket, wr_access,		0x139),
> +	NV_CSPMU_EVENT_ATTR_4(socket, ev_access,		0x13d),
> +	NV_CSPMU_EVENT_ATTR_4(socket, prb_access,		0x141),
> +
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_data,		0x145),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_access,		0x149),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_access,		0x14d),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_outstanding,		0x151),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_outstanding,		0x155),
> +
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_data,			0x159),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_access,		0x15d),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_access,		0x161),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_outstanding,		0x165),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_outstanding,		0x169),
> +
> +	ARM_CSPMU_EVENT_ATTR(gmem_rd_data,			0x16d),
> +	ARM_CSPMU_EVENT_ATTR(gmem_rd_access,			0x16e),
> +	ARM_CSPMU_EVENT_ATTR(gmem_rd_outstanding,		0x16f),
> +	ARM_CSPMU_EVENT_ATTR(gmem_dl_rsp,			0x170),
> +	ARM_CSPMU_EVENT_ATTR(gmem_dl_access,			0x171),
> +	ARM_CSPMU_EVENT_ATTR(gmem_dl_outstanding,		0x172),
> +	ARM_CSPMU_EVENT_ATTR(gmem_wb_data,			0x173),
> +	ARM_CSPMU_EVENT_ATTR(gmem_wb_access,			0x174),
> +	ARM_CSPMU_EVENT_ATTR(gmem_wb_outstanding,		0x175),
> +	ARM_CSPMU_EVENT_ATTR(gmem_ev_rsp,			0x176),
> +	ARM_CSPMU_EVENT_ATTR(gmem_ev_access,			0x177),
> +	ARM_CSPMU_EVENT_ATTR(gmem_ev_outstanding,		0x178),
> +	ARM_CSPMU_EVENT_ATTR(gmem_wr_data,			0x179),
> +	ARM_CSPMU_EVENT_ATTR(gmem_wr_outstanding,		0x17a),
> +	ARM_CSPMU_EVENT_ATTR(gmem_wr_access,			0x17b),
> +
> +	NV_CSPMU_EVENT_ATTR_4(socket, wr_data,			0x17c),
> +
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_data,		0x180),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_data,		0x184),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_access,		0x188),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_outstanding,		0x18c),
> +
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_data,			0x190),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_data,			0x194),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_access,		0x198),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_outstanding,		0x19c),
> +
> +	ARM_CSPMU_EVENT_ATTR(gmem_wr_total_bytes,		0x1a0),
> +	ARM_CSPMU_EVENT_ATTR(remote_socket_wr_total_bytes,	0x1a1),
> +	ARM_CSPMU_EVENT_ATTR(remote_socket_rd_data,		0x1a2),
> +	ARM_CSPMU_EVENT_ATTR(remote_socket_rd_outstanding,	0x1a3),
> +	ARM_CSPMU_EVENT_ATTR(remote_socket_rd_access,		0x1a4),
> +
> +	ARM_CSPMU_EVENT_ATTR(cmem_rd_data,			0x1a5),
> +	ARM_CSPMU_EVENT_ATTR(cmem_rd_access,			0x1a6),
> +	ARM_CSPMU_EVENT_ATTR(cmem_rd_outstanding,		0x1a7),
> +	ARM_CSPMU_EVENT_ATTR(cmem_dl_rsp,			0x1a8),
> +	ARM_CSPMU_EVENT_ATTR(cmem_dl_access,			0x1a9),
> +	ARM_CSPMU_EVENT_ATTR(cmem_dl_outstanding,		0x1aa),
> +	ARM_CSPMU_EVENT_ATTR(cmem_wb_data,			0x1ab),
> +	ARM_CSPMU_EVENT_ATTR(cmem_wb_access,			0x1ac),
> +	ARM_CSPMU_EVENT_ATTR(cmem_wb_outstanding,		0x1ad),
> +	ARM_CSPMU_EVENT_ATTR(cmem_ev_rsp,			0x1ae),
> +	ARM_CSPMU_EVENT_ATTR(cmem_ev_access,			0x1af),
> +	ARM_CSPMU_EVENT_ATTR(cmem_ev_outstanding,		0x1b0),
> +	ARM_CSPMU_EVENT_ATTR(cmem_wr_data,			0x1b1),
> +	ARM_CSPMU_EVENT_ATTR(cmem_wr_outstanding,		0x1b2),
> +
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_data,		0x1b3),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_access,		0x1b7),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_access,		0x1bb),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_outstanding,		0x1bf),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_outstanding,		0x1c3),
> +
> +	ARM_CSPMU_EVENT_ATTR(ocu_prb_access,			0x1c7),
> +	ARM_CSPMU_EVENT_ATTR(ocu_prb_data,			0x1c8),
> +	ARM_CSPMU_EVENT_ATTR(ocu_prb_outstanding,		0x1c9),
> +
> +	ARM_CSPMU_EVENT_ATTR(cmem_wr_access,			0x1ca),
> +
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_access,		0x1cb),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_data,		0x1cf),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_data,		0x1d3),
> +	NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_outstanding,		0x1d7),
> +
> +	ARM_CSPMU_EVENT_ATTR(cmem_wr_total_bytes,		0x1db),
> +
> +	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
> +	NULL,
> +};
> +
> +static struct attribute *mcf_pmu_event_attrs[] = {
> +	ARM_CSPMU_EVENT_ATTR(rd_bytes_loc,			0x0),
> +	ARM_CSPMU_EVENT_ATTR(rd_bytes_rem,			0x1),
> +	ARM_CSPMU_EVENT_ATTR(wr_bytes_loc,			0x2),
> +	ARM_CSPMU_EVENT_ATTR(wr_bytes_rem,			0x3),
> +	ARM_CSPMU_EVENT_ATTR(total_bytes_loc,			0x4),
> +	ARM_CSPMU_EVENT_ATTR(total_bytes_rem,			0x5),
> +	ARM_CSPMU_EVENT_ATTR(rd_req_loc,			0x6),
> +	ARM_CSPMU_EVENT_ATTR(rd_req_rem,			0x7),
> +	ARM_CSPMU_EVENT_ATTR(wr_req_loc,			0x8),
> +	ARM_CSPMU_EVENT_ATTR(wr_req_rem,			0x9),
> +	ARM_CSPMU_EVENT_ATTR(total_req_loc,			0xa),
> +	ARM_CSPMU_EVENT_ATTR(total_req_rem,			0xb),
> +	ARM_CSPMU_EVENT_ATTR(rd_cum_outs_loc,			0xc),
> +	ARM_CSPMU_EVENT_ATTR(rd_cum_outs_rem,			0xd),
> +	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
> +	NULL,
> +};
> +
> +static struct attribute *generic_pmu_event_attrs[] = {
> +	ARM_CSPMU_EVENT_ATTR(cycles, ARM_CSPMU_EVT_CYCLES_DEFAULT),
> +	NULL,
> +};
> +
> +static struct attribute *scf_pmu_format_attrs[] = {
> +	ARM_CSPMU_FORMAT_EVENT_ATTR,
> +	NULL,
> +};
> +
> +static struct attribute *mcf_pcie_pmu_format_attrs[] = {
> +	ARM_CSPMU_FORMAT_EVENT_ATTR,
> +	ARM_CSPMU_FORMAT_ATTR(root_port, "config1:0-9"),
> +	NULL,
> +};
> +
> +static struct attribute *mcf_gpu_pmu_format_attrs[] = {
> +	ARM_CSPMU_FORMAT_EVENT_ATTR,
> +	ARM_CSPMU_FORMAT_ATTR(gpu, "config1:0-1"),
> +	NULL,
> +};
> +
> +static struct attribute *mcf_nvlink_pmu_format_attrs[] = {
> +	ARM_CSPMU_FORMAT_EVENT_ATTR,
> +	ARM_CSPMU_FORMAT_ATTR(rem_socket, "config1:0-3"),
> +	NULL,
> +};
> +
> +static struct attribute *generic_pmu_format_attrs[] = {
> +	ARM_CSPMU_FORMAT_EVENT_ATTR,
> +	ARM_CSPMU_FORMAT_FILTER_ATTR,
> +	NULL,
> +};
> +
> +static struct attribute **
> +nv_cspmu_get_event_attrs(const struct arm_cspmu *cspmu)
> +{
> +	const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
> +
> +	return ctx->event_attr;
> +}
> +
> +static struct attribute **
> +nv_cspmu_get_format_attrs(const struct arm_cspmu *cspmu)
> +{
> +	const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
> +
> +	return ctx->format_attr;
> +}
> +
> +static const char *
> +nv_cspmu_get_name(const struct arm_cspmu *cspmu)
> +{
> +	const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
> +
> +	return ctx->name;
> +}
> +
> +static u32 nv_cspmu_event_filter(const struct perf_event *event)
> +{
> +	const struct nv_cspmu_ctx *ctx =
> +		to_nv_cspmu_ctx(to_arm_cspmu(event->pmu));
> +
> +	return event->attr.config1 & ctx->filter_mask;
> +}
> +
> +enum nv_cspmu_name_fmt {
> +	NAME_FMT_GENERIC,
> +	NAME_FMT_PROC
> +};
> +
> +struct nv_cspmu_match {
> +	u32 prodid;
> +	u32 prodid_mask;
> +	u64 filter_mask;
> +	const char *name_pattern;
> +	enum nv_cspmu_name_fmt name_fmt;
> +	struct attribute **event_attr;
> +	struct attribute **format_attr;
> +};
> +
> +static const struct nv_cspmu_match nv_cspmu_match[] = {

Similar coding style nit below.


Otherwise,

Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver
  2022-09-27 11:39   ` Suzuki K Poulose
@ 2022-09-28  1:27     ` Besar Wicaksono
  0 siblings, 0 replies; 13+ messages in thread
From: Besar Wicaksono @ 2022-09-28  1:27 UTC (permalink / raw)
  To: Suzuki K Poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi, mathieu.poirier, mike.leach,
	leo.yan



> -----Original Message-----
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> Sent: Tuesday, September 27, 2022 6:39 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>; robin.murphy@arm.com;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>; mathieu.poirier@linaro.org;
> mike.leach@linaro.org; leo.yan@linaro.org
> Subject: Re: [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM
> CoreSight PMU driver
> 
> External email: Use caution opening links or attachments
> 
> 
> On 14/08/2022 19:23, Besar Wicaksono wrote:
> > Add support for ARM CoreSight PMU driver framework and interfaces.
> > The driver provides generic implementation to operate uncore PMU based
> > on ARM CoreSight PMU architecture. The driver also provides interface
> > to get vendor/implementation specific information, for example event
> > attributes and formating.
> >
> > The specification used in this implementation can be found below:
> >   * ACPI Arm Performance Monitoring Unit table:
> >          https://developer.arm.com/documentation/den0117/latest
> >   * ARM Coresight PMU architecture:
> >          https://developer.arm.com/documentation/ihi0091/latest
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> > ---
> >   arch/arm64/configs/defconfig       |    1 +
> >   drivers/perf/Kconfig               |    2 +
> >   drivers/perf/Makefile              |    1 +
> >   drivers/perf/arm_cspmu/Kconfig     |   13 +
> >   drivers/perf/arm_cspmu/Makefile    |    6 +
> >   drivers/perf/arm_cspmu/arm_cspmu.c | 1262
> ++++++++++++++++++++++++++++
> >   drivers/perf/arm_cspmu/arm_cspmu.h |  151 ++++
> >   7 files changed, 1436 insertions(+)
> >   create mode 100644 drivers/perf/arm_cspmu/Kconfig
> >   create mode 100644 drivers/perf/arm_cspmu/Makefile
> >   create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.c
> >   create mode 100644 drivers/perf/arm_cspmu/arm_cspmu.h
> >
> > diff --git a/arch/arm64/configs/defconfig b/arch/arm64/configs/defconfig
> > index 7d1105343bc2..ee31c9159a5b 100644
> > --- a/arch/arm64/configs/defconfig
> > +++ b/arch/arm64/configs/defconfig
> > @@ -1212,6 +1212,7 @@ CONFIG_PHY_UNIPHIER_USB3=y
> >   CONFIG_PHY_TEGRA_XUSB=y
> >   CONFIG_PHY_AM654_SERDES=m
> >   CONFIG_PHY_J721E_WIZ=m
> > +CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU=y
> >   CONFIG_ARM_SMMU_V3_PMU=m
> >   CONFIG_FSL_IMX8_DDR_PMU=m
> >   CONFIG_QCOM_L2_PMU=y
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 1e2d69453771..c94d3601eb48 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -192,4 +192,6 @@ config MARVELL_CN10K_DDR_PMU
> >         Enable perf support for Marvell DDR Performance monitoring
> >         event on CN10K platform.
> >
> > +source "drivers/perf/arm_cspmu/Kconfig"
> > +
> >   endmenu
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index 57a279c61df5..3bc9323f0965 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -20,3 +20,4 @@ obj-$(CONFIG_ARM_DMC620_PMU) +=
> arm_dmc620_pmu.o
> >   obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) +=
> marvell_cn10k_tad_pmu.o
> >   obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) +=
> marvell_cn10k_ddr_pmu.o
> >   obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
> > +obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) +=
> arm_cspmu/
> > diff --git a/drivers/perf/arm_cspmu/Kconfig
> b/drivers/perf/arm_cspmu/Kconfig
> > new file mode 100644
> > index 000000000000..c2c56ecafccb
> > --- /dev/null
> > +++ b/drivers/perf/arm_cspmu/Kconfig
> > @@ -0,0 +1,13 @@
> > +# SPDX-License-Identifier: GPL-2.0
> > +#
> > +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > +
> > +config ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU
> > +     tristate "ARM Coresight Architecture PMU"
> > +     depends on ACPI
> > +     depends on ACPI_APMT || COMPILE_TEST
> > +     help
> > +       Provides support for performance monitoring unit (PMU) devices
> > +       based on ARM CoreSight PMU architecture. Note that this PMU
> > +       architecture does not have relationship with the ARM CoreSight
> > +       Self-Hosted Tracing.
> > diff --git a/drivers/perf/arm_cspmu/Makefile
> b/drivers/perf/arm_cspmu/Makefile
> > new file mode 100644
> > index 000000000000..cdc3455f74d8
> > --- /dev/null
> > +++ b/drivers/perf/arm_cspmu/Makefile
> > @@ -0,0 +1,6 @@
> > +# Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > +#
> > +# SPDX-License-Identifier: GPL-2.0
> > +
> > +obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += \
> > +     arm_cspmu.o
> > diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c
> b/drivers/perf/arm_cspmu/arm_cspmu.c
> > new file mode 100644
> > index 000000000000..410876f86eb0
> > --- /dev/null
> > +++ b/drivers/perf/arm_cspmu/arm_cspmu.c
> > @@ -0,0 +1,1262 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * ARM CoreSight Architecture PMU driver.
> > + *
> > + * This driver adds support for uncore PMU based on ARM CoreSight
> Performance
> > + * Monitoring Unit Architecture. The PMU is accessible via MMIO registers
> and
> > + * like other uncore PMUs, it does not support process specific events and
> > + * cannot be used in sampling mode.
> > + *
> > + * This code is based on other uncore PMUs like ARM DSU PMU. It
> provides a
> > + * generic implementation to operate the PMU according to CoreSight
> PMU
> > + * architecture and ACPI ARM PMU table (APMT) documents below:
> > + *   - ARM CoreSight PMU architecture document number: ARM IHI 0091
> A.a-00bet0.
> > + *   - APMT document number: ARM DEN0117.
> > + *
> > + * The user should refer to the vendor technical documentation to get
> details
> > + * about the supported events.
> > + *
> > + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > + *
> > + */
> > +
> > +#include <linux/acpi.h>
> > +#include <linux/cacheinfo.h>
> > +#include <linux/ctype.h>
> > +#include <linux/interrupt.h>
> > +#include <linux/io-64-nonatomic-lo-hi.h>
> > +#include <linux/module.h>
> > +#include <linux/perf_event.h>
> > +#include <linux/platform_device.h>
> > +#include <acpi/processor.h>
> > +
> > +#include "arm_cspmu.h"
> > +
> > +#define PMUNAME "arm_cspmu"
> > +#define DRVNAME "arm-cs-arch-pmu"
> > +
> > +#define ARM_CSPMU_CPUMASK_ATTR(_name, _config)                       \
> > +     ARM_CSPMU_EXT_ATTR(_name, arm_cspmu_cpumask_show,       \
> > +                             (unsigned long)_config)
> > +
> > +/*
> > + * CoreSight PMU Arch register offsets.
> > + */
> > +#define PMEVCNTR_LO                                  0x0
> > +#define PMEVCNTR_HI                                  0x4
> > +#define PMEVTYPER                                    0x400
> > +#define PMCCFILTR                                    0x47C
> > +#define PMEVFILTR                                    0xA00
> > +#define PMCNTENSET                                   0xC00
> > +#define PMCNTENCLR                                   0xC20
> > +#define PMINTENSET                                   0xC40
> > +#define PMINTENCLR                                   0xC60
> > +#define PMOVSCLR                                     0xC80
> > +#define PMOVSSET                                     0xCC0
> > +#define PMCFGR                                               0xE00
> > +#define PMCR                                         0xE04
> > +#define PMIIDR                                               0xE08
> > +
> > +/* PMCFGR register field */
> > +#define PMCFGR_NCG                                   GENMASK(31, 28)
> > +#define PMCFGR_HDBG                                  BIT(24)
> > +#define PMCFGR_TRO                                   BIT(23)
> > +#define PMCFGR_SS                                    BIT(22)
> > +#define PMCFGR_FZO                                   BIT(21)
> > +#define PMCFGR_MSI                                   BIT(20)
> > +#define PMCFGR_UEN                                   BIT(19)
> > +#define PMCFGR_NA                                    BIT(17)
> > +#define PMCFGR_EX                                    BIT(16)
> > +#define PMCFGR_CCD                                   BIT(15)
> > +#define PMCFGR_CC                                    BIT(14)
> > +#define PMCFGR_SIZE                                  GENMASK(13, 8)
> > +#define PMCFGR_N                                     GENMASK(7, 0)
> > +
> > +/* PMCR register field */
> > +#define PMCR_TRO                                     BIT(11)
> > +#define PMCR_HDBG                                    BIT(10)
> > +#define PMCR_FZO                                     BIT(9)
> > +#define PMCR_NA                                              BIT(8)
> > +#define PMCR_DP                                              BIT(5)
> > +#define PMCR_X                                               BIT(4)
> > +#define PMCR_D                                               BIT(3)
> > +#define PMCR_C                                               BIT(2)
> > +#define PMCR_P                                               BIT(1)
> > +#define PMCR_E                                               BIT(0)
> > +
> > +/* Each SET/CLR register supports up to 32 counters. */
> > +#define ARM_CSPMU_SET_CLR_COUNTER_SHIFT              5
> > +#define ARM_CSPMU_SET_CLR_COUNTER_NUM                \
> > +     (1 << ARM_CSPMU_SET_CLR_COUNTER_SHIFT)
> > +
> > +/* The number of 32-bit SET/CLR register that can be supported. */
> > +#define ARM_CSPMU_SET_CLR_MAX_NUM ((PMCNTENCLR -
> PMCNTENSET) / sizeof(u32))
> > +
> > +static_assert(
> > +     (ARM_CSPMU_SET_CLR_MAX_NUM *
> ARM_CSPMU_SET_CLR_COUNTER_NUM) >=
> > +     ARM_CSPMU_MAX_HW_CNTRS);
> > +
> > +/* Convert counter idx into SET/CLR register number. */
> > +#define COUNTER_TO_SET_CLR_ID(idx)                   \
> > +     (idx >> ARM_CSPMU_SET_CLR_COUNTER_SHIFT)
> > +
> > +/* Convert counter idx into SET/CLR register bit. */
> > +#define COUNTER_TO_SET_CLR_BIT(idx)                  \
> > +     (idx & (ARM_CSPMU_SET_CLR_COUNTER_NUM - 1))
> > +
> > +#define ARM_CSPMU_ACTIVE_CPU_MASK            0x0
> > +#define ARM_CSPMU_ASSOCIATED_CPU_MASK                0x1
> > +
> > +/* Check if field f in flags is set with value v */
> > +#define CHECK_APMT_FLAG(flags, f, v) \
> > +     ((flags & (ACPI_APMT_FLAGS_ ## f)) == (ACPI_APMT_FLAGS_ ## f ##
> _ ## v))
> > +
> > +/* Check and use default if implementer doesn't provide attribute
> callback */
> > +#define CHECK_DEFAULT_IMPL_OPS(ops, callback)                        \
> > +     do {                                                    \
> > +             if (!ops->callback)                             \
> > +                     ops->callback = arm_cspmu_ ## callback; \
> > +     } while (0)
> > +
> > +static unsigned long arm_cspmu_cpuhp_state;
> > +
> > +/*
> > + * In CoreSight PMU architecture, all of the MMIO registers are 32-bit
> except
> > + * counter register. The counter register can be implemented as 32-bit or
> 64-bit
> > + * register depending on the value of PMCFGR.SIZE field. For 64-bit
> access,
> > + * single-copy 64-bit atomic support is implementation defined. APMT
> node flag
> > + * is used to identify if the PMU supports 64-bit single copy atomic. If 64-
> bit
> > + * single copy atomic is not supported, the driver treats the register as a
> pair
> > + * of 32-bit register.
> > + */
> > +
> > +/*
> > + * Read 64-bit register as a pair of 32-bit registers using hi-lo-hi sequence.
> > + */
> > +static u64 read_reg64_hilohi(const void __iomem *addr)
> > +{
> > +     u32 val_lo, val_hi;
> > +     u64 val;
> > +
> > +     /* Use high-low-high sequence to avoid tearing */
> > +     do {
> > +             val_hi = readl(addr + 4);
> > +             val_lo = readl(addr);
> > +     } while (val_hi != readl(addr + 4));
> > +
> > +     val = (((u64)val_hi << 32) | val_lo);
> > +
> > +     return val;
> > +}
> > +
> > +/* Check if PMU supports 64-bit single copy atomic. */
> > +static inline bool supports_64bit_atomics(const struct arm_cspmu
> *cspmu)
> > +{
> > +     return CHECK_APMT_FLAG(cspmu->apmt_node->flags, ATOMIC,
> SUPP);
> > +}
> > +
> > +/* Check if cycle counter is supported. */
> > +static inline bool supports_cycle_counter(const struct arm_cspmu
> *cspmu)
> > +{
> > +     return (cspmu->pmcfgr & PMCFGR_CC);
> > +}
> > +
> > +/* Get counter size, which is (PMCFGR_SIZE + 1). */
> > +static inline u32 counter_size(const struct arm_cspmu *cspmu)
> > +{
> > +     return FIELD_GET(PMCFGR_SIZE, cspmu->pmcfgr) + 1;
> > +}
> > +
> > +/* Get counter mask. */
> > +static inline u64 counter_mask(const struct arm_cspmu *cspmu)
> > +{
> > +     return GENMASK_ULL(counter_size(cspmu) - 1, 0);
> > +}
> > +
> > +/* Check if counter is implemented as 64-bit register. */
> > +static inline bool use_64b_counter_reg(const struct arm_cspmu *cspmu)
> > +{
> > +     return (counter_size(cspmu) > 32);
> > +}
> > +
> > +ssize_t arm_cspmu_sysfs_event_show(struct device *dev,
> > +                             struct device_attribute *attr, char *buf)
> > +{
> > +     struct dev_ext_attribute *eattr =
> > +             container_of(attr, struct dev_ext_attribute, attr);
> > +     return sysfs_emit(buf, "event=0x%llx\n",
> > +                       (unsigned long long)eattr->var);
> > +}
> > +EXPORT_SYMBOL_GPL(arm_cspmu_sysfs_event_show);
> > +
> > +/* Default event list. */
> > +static struct attribute *arm_cspmu_event_attrs[] = {
> > +     ARM_CSPMU_EVENT_ATTR(cycles,
> ARM_CSPMU_EVT_CYCLES_DEFAULT),
> > +     NULL,
> > +};
> > +
> > +static struct attribute **
> > +arm_cspmu_get_event_attrs(const struct arm_cspmu *cspmu)
> > +{
> > +     return arm_cspmu_event_attrs;
> > +}
> > +
> > +static umode_t
> > +arm_cspmu_event_attr_is_visible(struct kobject *kobj,
> > +                             struct attribute *attr, int unused)
> > +{
> > +     struct device *dev = kobj_to_dev(kobj);
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(dev_get_drvdata(dev));
> > +     struct perf_pmu_events_attr *eattr;
> > +
> > +     eattr = container_of(attr, typeof(*eattr), attr.attr);
> > +
> > +     /* Hide cycle event if not supported */
> > +     if (!supports_cycle_counter(cspmu) &&
> > +         eattr->id == ARM_CSPMU_EVT_CYCLES_DEFAULT)
> > +             return 0;
> > +
> > +     return attr->mode;
> > +}
> > +
> > +ssize_t arm_cspmu_sysfs_format_show(struct device *dev,
> > +                             struct device_attribute *attr,
> > +                             char *buf)
> > +{
> > +     struct dev_ext_attribute *eattr =
> > +             container_of(attr, struct dev_ext_attribute, attr);
> > +     return sysfs_emit(buf, "%s\n", (char *)eattr->var);
> > +}
> > +EXPORT_SYMBOL_GPL(arm_cspmu_sysfs_format_show);
> > +
> > +static struct attribute *arm_cspmu_format_attrs[] = {
> > +     ARM_CSPMU_FORMAT_EVENT_ATTR,
> > +     ARM_CSPMU_FORMAT_FILTER_ATTR,
> > +     NULL,
> > +};
> > +
> > +static struct attribute **
> > +arm_cspmu_get_format_attrs(const struct arm_cspmu *cspmu)
> > +{
> > +     return arm_cspmu_format_attrs;
> > +}
> > +
> > +static u32 arm_cspmu_event_type(const struct perf_event *event)
> > +{
> > +     return event->attr.config & ARM_CSPMU_EVENT_MASK;
> > +}
> > +
> > +static bool arm_cspmu_is_cycle_counter_event(const struct perf_event
> *event)
> > +{
> > +     return (event->attr.config == ARM_CSPMU_EVT_CYCLES_DEFAULT);
> > +}
> > +
> > +static u32 arm_cspmu_event_filter(const struct perf_event *event)
> > +{
> > +     return event->attr.config1 & ARM_CSPMU_FILTER_MASK;
> > +}
> > +
> > +static ssize_t arm_cspmu_identifier_show(struct device *dev,
> > +                                      struct device_attribute *attr,
> > +                                      char *page)
> > +{
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(dev_get_drvdata(dev));
> > +
> > +     return sysfs_emit(page, "%s\n", cspmu->identifier);
> > +}
> > +
> > +static struct device_attribute arm_cspmu_identifier_attr =
> > +     __ATTR(identifier, 0444, arm_cspmu_identifier_show, NULL);
> > +
> > +static struct attribute *arm_cspmu_identifier_attrs[] = {
> > +     &arm_cspmu_identifier_attr.attr,
> > +     NULL,
> > +};
> > +
> > +static struct attribute_group arm_cspmu_identifier_attr_group = {
> > +     .attrs = arm_cspmu_identifier_attrs,
> > +};
> > +
> > +static const char *arm_cspmu_get_identifier(const struct arm_cspmu
> *cspmu)
> > +{
> > +     const char *identifier =
> > +             devm_kasprintf(cspmu->dev, GFP_KERNEL, "%x",
> > +                            cspmu->impl.pmiidr);
> > +     return identifier;
> > +}
> > +
> > +static const char
> *arm_cspmu_type_str[ACPI_APMT_NODE_TYPE_COUNT] = {
> > +     "mc",
> > +     "smmu",
> > +     "pcie",
> > +     "acpi",
> > +     "cache",
> > +};
> > +
> > +static const char *arm_cspmu_get_name(const struct arm_cspmu
> *cspmu)
> > +{
> > +     struct device *dev;
> > +     struct acpi_apmt_node *apmt_node;
> > +     u8 pmu_type;
> > +     char *name;
> > +     char acpi_hid_string[ACPI_ID_LEN] = { 0 };
> > +     static atomic_t pmu_idx[ACPI_APMT_NODE_TYPE_COUNT] = { 0 };
> > +
> > +     dev = cspmu->dev;
> > +     apmt_node = cspmu->apmt_node;
> > +     pmu_type = apmt_node->type;
> > +
> > +     if (pmu_type >= ACPI_APMT_NODE_TYPE_COUNT) {
> > +             dev_err(dev, "unsupported PMU type-%u\n", pmu_type);
> > +             return NULL;
> > +     }
> > +
> > +     if (pmu_type == ACPI_APMT_NODE_TYPE_ACPI) {
> > +             memcpy(acpi_hid_string,
> > +                     &apmt_node->inst_primary,
> > +                     sizeof(apmt_node->inst_primary));
> > +             name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s_%s_%u",
> PMUNAME,
> > +                                   arm_cspmu_type_str[pmu_type],
> > +                                   acpi_hid_string,
> > +                                   apmt_node->inst_secondary);
> > +     } else {
> > +             name = devm_kasprintf(dev, GFP_KERNEL, "%s_%s_%d",
> PMUNAME,
> > +                                   arm_cspmu_type_str[pmu_type],
> > +                                   atomic_fetch_inc(&pmu_idx[pmu_type]));
> > +     }
> > +
> > +     return name;
> > +}
> > +
> > +static ssize_t arm_cspmu_cpumask_show(struct device *dev,
> > +                                   struct device_attribute *attr,
> > +                                   char *buf)
> > +{
> > +     struct pmu *pmu = dev_get_drvdata(dev);
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
> > +     struct dev_ext_attribute *eattr =
> > +             container_of(attr, struct dev_ext_attribute, attr);
> > +     unsigned long mask_id = (unsigned long)eattr->var;
> > +     const cpumask_t *cpumask;
> > +
> > +     switch (mask_id) {
> > +     case ARM_CSPMU_ACTIVE_CPU_MASK:
> > +             cpumask = &cspmu->active_cpu;
> > +             break;
> > +     case ARM_CSPMU_ASSOCIATED_CPU_MASK:
> > +             cpumask = &cspmu->associated_cpus;
> > +             break;
> > +     default:
> > +             return 0;
> > +     }
> > +     return cpumap_print_to_pagebuf(true, buf, cpumask);
> > +}
> > +
> > +static struct attribute *arm_cspmu_cpumask_attrs[] = {
> > +     ARM_CSPMU_CPUMASK_ATTR(cpumask,
> ARM_CSPMU_ACTIVE_CPU_MASK),
> > +     ARM_CSPMU_CPUMASK_ATTR(associated_cpus,
> ARM_CSPMU_ASSOCIATED_CPU_MASK),
> > +     NULL,
> > +};
> > +
> > +static struct attribute_group arm_cspmu_cpumask_attr_group = {
> > +     .attrs = arm_cspmu_cpumask_attrs,
> > +};
> > +
> > +struct impl_match {
> > +     u32 pmiidr;
> > +     u32 mask;
> > +     int (*impl_init_ops)(struct arm_cspmu *cspmu);
> > +};
> > +
> > +static const struct impl_match impl_match[] = {
> > +     {}
> > +};
> > +
> > +static int arm_cspmu_init_impl_ops(struct arm_cspmu *cspmu)
> > +{
> > +     int ret;
> > +     struct acpi_apmt_node *apmt_node = cspmu->apmt_node;
> > +     struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
> > +     const struct impl_match *match = impl_match;
> > +
> > +     /*
> > +      * Get PMU implementer and product id from APMT node.
> > +      * If APMT node doesn't have implementer/product id, try get it
> > +      * from PMIIDR.
> > +      */
> > +     cspmu->impl.pmiidr =
> > +             (apmt_node->impl_id) ? apmt_node->impl_id :
> > +                                    readl(cspmu->base0 + PMIIDR);
> > +
> > +     /* Find implementer specific attribute ops. */
> > +     for (; match->pmiidr; match++) {
> > +             const u32 mask = match->mask;
> > +
> > +             if ((match->pmiidr & mask) == (cspmu->impl.pmiidr & mask)) {
> > +                     ret = match->impl_init_ops(cspmu);
> > +                     if (ret)
> > +                             return ret;
> > +
> > +                     break;
> > +             }
> > +     }
> > +
> > +     /* Use default callbacks if implementer doesn't provide one. */
> > +     CHECK_DEFAULT_IMPL_OPS(impl_ops, get_event_attrs);
> > +     CHECK_DEFAULT_IMPL_OPS(impl_ops, get_format_attrs);
> > +     CHECK_DEFAULT_IMPL_OPS(impl_ops, get_identifier);
> > +     CHECK_DEFAULT_IMPL_OPS(impl_ops, get_name);
> > +     CHECK_DEFAULT_IMPL_OPS(impl_ops, is_cycle_counter_event);
> > +     CHECK_DEFAULT_IMPL_OPS(impl_ops, event_type);
> > +     CHECK_DEFAULT_IMPL_OPS(impl_ops, event_filter);
> > +     CHECK_DEFAULT_IMPL_OPS(impl_ops, event_attr_is_visible);
> > +
> > +     return 0;
> > +}
> > +
> > +static struct attribute_group *
> > +arm_cspmu_alloc_event_attr_group(struct arm_cspmu *cspmu)
> > +{
> > +     struct attribute_group *event_group;
> > +     struct device *dev = cspmu->dev;
> > +     const struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
> > +
> > +     event_group =
> > +             devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> > +     if (!event_group)
> > +             return NULL;
> > +
> > +     event_group->name = "events";
> > +     event_group->attrs = impl_ops->get_event_attrs(cspmu);
> > +     event_group->is_visible = impl_ops->event_attr_is_visible;
> > +
> > +     return event_group;
> > +}
> > +
> > +static struct attribute_group *
> > +arm_cspmu_alloc_format_attr_group(struct arm_cspmu *cspmu)
> > +{
> > +     struct attribute_group *format_group;
> > +     struct device *dev = cspmu->dev;
> > +
> > +     format_group =
> > +             devm_kzalloc(dev, sizeof(struct attribute_group), GFP_KERNEL);
> > +     if (!format_group)
> > +             return NULL;
> > +
> > +     format_group->name = "format";
> > +     format_group->attrs = cspmu->impl.ops.get_format_attrs(cspmu);
> > +
> > +     return format_group;
> > +}
> > +
> > +static struct attribute_group **
> > +arm_cspmu_alloc_attr_group(struct arm_cspmu *cspmu)
> > +{
> > +     struct attribute_group **attr_groups = NULL;
> > +     struct device *dev = cspmu->dev;
> > +     const struct arm_cspmu_impl_ops *impl_ops = &cspmu->impl.ops;
> > +     int ret;
> > +
> > +     ret = arm_cspmu_init_impl_ops(cspmu);
> > +     if (ret)
> > +             return NULL;
> > +
> > +     cspmu->identifier = impl_ops->get_identifier(cspmu);
> > +     cspmu->name = impl_ops->get_name(cspmu);
> > +
> > +     if (!cspmu->identifier || !cspmu->name)
> > +             return NULL;
> > +
> > +     attr_groups = devm_kcalloc(dev, 5, sizeof(struct attribute_group *),
> > +                                GFP_KERNEL);
> > +     if (!attr_groups)
> > +             return NULL;
> > +
> > +     attr_groups[0] = arm_cspmu_alloc_event_attr_group(cspmu);
> > +     attr_groups[1] = arm_cspmu_alloc_format_attr_group(cspmu);
> > +     attr_groups[2] = &arm_cspmu_identifier_attr_group;
> > +     attr_groups[3] = &arm_cspmu_cpumask_attr_group;
> > +
> > +     if (!attr_groups[0] || !attr_groups[1])
> > +             return NULL;
> > +
> > +     return attr_groups;
> > +}
> > +
> > +static inline void arm_cspmu_reset_counters(struct arm_cspmu *cspmu)
> > +{
> > +     u32 pmcr = 0;
> > +
> > +     pmcr |= PMCR_P;
> > +     pmcr |= PMCR_C;
> > +     writel(pmcr, cspmu->base0 + PMCR);
> > +}
> > +
> > +static inline void arm_cspmu_start_counters(struct arm_cspmu *cspmu)
> > +{
> > +     writel(PMCR_E, cspmu->base0 + PMCR);
> > +}
> > +
> > +static inline void arm_cspmu_stop_counters(struct arm_cspmu *cspmu)
> > +{
> > +     writel(0, cspmu->base0 + PMCR);
> > +}
> > +
> > +static void arm_cspmu_enable(struct pmu *pmu)
> > +{
> > +     bool disabled;
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
> > +
> > +     disabled = bitmap_empty(cspmu->hw_events.used_ctrs,
> > +                             cspmu->num_logical_ctrs);
> > +
> > +     if (disabled)
> > +             return;
> > +
> > +     arm_cspmu_start_counters(cspmu);
> > +}
> > +
> > +static void arm_cspmu_disable(struct pmu *pmu)
> > +{
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(pmu);
> > +
> > +     arm_cspmu_stop_counters(cspmu);
> > +}
> > +
> > +static int arm_cspmu_get_event_idx(struct arm_cspmu_hw_events
> *hw_events,
> > +                             struct perf_event *event)
> > +{
> > +     int idx;
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +
> > +     if (supports_cycle_counter(cspmu)) {
> > +             if (cspmu->impl.ops.is_cycle_counter_event(event)) {
> > +                     /* Search for available cycle counter. */
> > +                     if (test_and_set_bit(cspmu->cycle_counter_logical_idx,
> > +                                          hw_events->used_ctrs))
> > +                             return -EAGAIN;
> > +
> > +                     return cspmu->cycle_counter_logical_idx;
> > +             }
> > +
> > +             /*
> > +              * Search a regular counter from the used counter bitmap.
> > +              * The cycle counter divides the bitmap into two parts. Search
> > +              * the first then second half to exclude the cycle counter bit.
> > +              */
> > +             idx = find_first_zero_bit(hw_events->used_ctrs,
> > +                                       cspmu->cycle_counter_logical_idx);
> > +             if (idx >= cspmu->cycle_counter_logical_idx) {
> > +                     idx = find_next_zero_bit(
> > +                             hw_events->used_ctrs,
> > +                             cspmu->num_logical_ctrs,
> > +                             cspmu->cycle_counter_logical_idx + 1);
> > +             }
> > +     } else {
> > +             idx = find_first_zero_bit(hw_events->used_ctrs,
> > +                                       cspmu->num_logical_ctrs);
> > +     }
> > +
> > +     if (idx >= cspmu->num_logical_ctrs)
> > +             return -EAGAIN;
> > +
> > +     set_bit(idx, hw_events->used_ctrs);
> > +
> > +     return idx;
> > +}
> > +
> > +static bool arm_cspmu_validate_event(struct pmu *pmu,
> > +                              struct arm_cspmu_hw_events *hw_events,
> > +                              struct perf_event *event)
> > +{
> > +     if (is_software_event(event))
> > +             return true;
> > +
> > +     /* Reject groups spanning multiple HW PMUs. */
> > +     if (event->pmu != pmu)
> > +             return false;
> > +
> > +     return (arm_cspmu_get_event_idx(hw_events, event) >= 0);
> > +}
> > +
> > +/*
> > + * Make sure the group of events can be scheduled at once
> > + * on the PMU.
> > + */
> > +static bool arm_cspmu_validate_group(struct perf_event *event)
> > +{
> > +     struct perf_event *sibling, *leader = event->group_leader;
> > +     struct arm_cspmu_hw_events fake_hw_events;
> > +
> > +     if (event->group_leader == event)
> > +             return true;
> > +
> > +     memset(&fake_hw_events, 0, sizeof(fake_hw_events));
> > +
> > +     if (!arm_cspmu_validate_event(event->pmu, &fake_hw_events,
> leader))
> > +             return false;
> > +
> > +     for_each_sibling_event(sibling, leader) {
> > +             if (!arm_cspmu_validate_event(event->pmu, &fake_hw_events,
> > +                                               sibling))
> > +                     return false;
> > +     }
> > +
> > +     return arm_cspmu_validate_event(event->pmu, &fake_hw_events,
> event);
> > +}
> > +
> > +static int arm_cspmu_event_init(struct perf_event *event)
> > +{
> > +     struct arm_cspmu *cspmu;
> > +     struct hw_perf_event *hwc = &event->hw;
> > +
> > +     cspmu = to_arm_cspmu(event->pmu);
> > +
> > +     /*
> > +      * Following other "uncore" PMUs, we do not support sampling mode
> or
> > +      * attach to a task (per-process mode).
> > +      */
> > +     if (is_sampling_event(event)) {
> > +             dev_dbg(cspmu->pmu.dev,
> > +                     "Can't support sampling events\n");
> > +             return -EOPNOTSUPP;
> > +     }
> > +
> > +     if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK) {
> > +             dev_dbg(cspmu->pmu.dev,
> > +                     "Can't support per-task counters\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     /*
> > +      * Make sure the CPU assignment is on one of the CPUs associated with
> > +      * this PMU.
> > +      */
> > +     if (!cpumask_test_cpu(event->cpu, &cspmu->associated_cpus)) {
> > +             dev_dbg(cspmu->pmu.dev,
> > +                     "Requested cpu is not associated with the PMU\n");
> > +             return -EINVAL;
> > +     }
> > +
> > +     /* Enforce the current active CPU to handle the events in this PMU. */
> > +     event->cpu = cpumask_first(&cspmu->active_cpu);
> > +     if (event->cpu >= nr_cpu_ids)
> > +             return -EINVAL;
> > +
> > +     if (!arm_cspmu_validate_group(event))
> > +             return -EINVAL;
> > +
> > +     /*
> > +      * The logical counter id is tracked with hw_perf_event.extra_reg.idx.
> > +      * The physical counter id is tracked with hw_perf_event.idx.
> > +      * We don't assign an index until we actually place the event onto
> > +      * hardware. Use -1 to signify that we haven't decided where to put it
> > +      * yet.
> > +      */
> > +     hwc->idx = -1;
> > +     hwc->extra_reg.idx = -1;
> > +     hwc->config = cspmu->impl.ops.event_type(event);
> > +
> > +     return 0;
> > +}
> > +
> > +static inline u32 counter_offset(u32 reg_sz, u32 ctr_idx)
> > +{
> > +     return (PMEVCNTR_LO + (reg_sz * ctr_idx));
> > +}
> > +
> > +static void arm_cspmu_write_counter(struct perf_event *event, u64 val)
> > +{
> > +     u32 offset;
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +
> > +     if (use_64b_counter_reg(cspmu)) {
> > +             offset = counter_offset(sizeof(u64), event->hw.idx);
> > +
> > +             writeq(val, cspmu->base1 + offset);
> > +     } else {
> > +             offset = counter_offset(sizeof(u32), event->hw.idx);
> > +
> > +             writel(lower_32_bits(val), cspmu->base1 + offset);
> > +     }
> > +}
> > +
> > +static u64 arm_cspmu_read_counter(struct perf_event *event)
> > +{
> > +     u32 offset;
> > +     const void __iomem *counter_addr;
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +
> > +     if (use_64b_counter_reg(cspmu)) {
> > +             offset = counter_offset(sizeof(u64), event->hw.idx);
> > +             counter_addr = cspmu->base1 + offset;
> > +
> > +             return supports_64bit_atomics(cspmu) ?
> > +                            readq(counter_addr) :
> > +                            read_reg64_hilohi(counter_addr);
> > +     }
> > +
> > +     offset = counter_offset(sizeof(u32), event->hw.idx);
> > +     return readl(cspmu->base1 + offset);
> > +}
> > +
> > +/*
> > + * arm_cspmu_set_event_period: Set the period for the counter.
> > + *
> > + * To handle cases of extreme interrupt latency, we program
> > + * the counter with half of the max count for the counters.
> > + */
> > +static void arm_cspmu_set_event_period(struct perf_event *event)
> > +{
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +     u64 val = counter_mask(cspmu) >> 1ULL;
> > +
> > +     local64_set(&event->hw.prev_count, val);
> > +     arm_cspmu_write_counter(event, val);
> > +}
> > +
> > +static void arm_cspmu_enable_counter(struct arm_cspmu *cspmu, int
> idx)
> > +{
> > +     u32 reg_id, reg_bit, inten_off, cnten_off;
> > +
> > +     reg_id = COUNTER_TO_SET_CLR_ID(idx);
> > +     reg_bit = COUNTER_TO_SET_CLR_BIT(idx);
> > +
> > +     inten_off = PMINTENSET + (4 * reg_id);
> > +     cnten_off = PMCNTENSET + (4 * reg_id);
> > +
> > +     writel(BIT(reg_bit), cspmu->base0 + inten_off);
> > +     writel(BIT(reg_bit), cspmu->base0 + cnten_off);
> > +}
> > +
> > +static void arm_cspmu_disable_counter(struct arm_cspmu *cspmu, int
> idx)
> > +{
> > +     u32 reg_id, reg_bit, inten_off, cnten_off;
> > +
> > +     reg_id = COUNTER_TO_SET_CLR_ID(idx);
> > +     reg_bit = COUNTER_TO_SET_CLR_BIT(idx);
> > +
> > +     inten_off = PMINTENCLR + (4 * reg_id);
> > +     cnten_off = PMCNTENCLR + (4 * reg_id);
> > +
> > +     writel(BIT(reg_bit), cspmu->base0 + cnten_off);
> > +     writel(BIT(reg_bit), cspmu->base0 + inten_off);
> > +}
> > +
> > +static void arm_cspmu_event_update(struct perf_event *event)
> > +{
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +     struct hw_perf_event *hwc = &event->hw;
> > +     u64 delta, prev, now;
> > +
> > +     do {
> > +             prev = local64_read(&hwc->prev_count);
> > +             now = arm_cspmu_read_counter(event);
> > +     } while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
> > +
> > +     delta = (now - prev) & counter_mask(cspmu);
> > +     local64_add(delta, &event->count);
> > +}
> > +
> > +static inline void arm_cspmu_set_event(struct arm_cspmu *cspmu,
> > +                                     struct hw_perf_event *hwc)
> > +{
> > +     u32 offset = PMEVTYPER + (4 * hwc->idx);
> > +
> > +     writel(hwc->config, cspmu->base0 + offset);
> > +}
> > +
> > +static inline void arm_cspmu_set_ev_filter(struct arm_cspmu *cspmu,
> > +                                        struct hw_perf_event *hwc,
> > +                                        u32 filter)
> > +{
> > +     u32 offset = PMEVFILTR + (4 * hwc->idx);
> > +
> > +     writel(filter, cspmu->base0 + offset);
> > +}
> > +
> > +static inline void arm_cspmu_set_cc_filter(struct arm_cspmu *cspmu, u32
> filter)
> > +{
> > +     u32 offset = PMCCFILTR;
> > +
> > +     writel(filter, cspmu->base0 + offset);
> > +}
> > +
> > +static void arm_cspmu_start(struct perf_event *event, int pmu_flags)
> > +{
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +     struct hw_perf_event *hwc = &event->hw;
> > +     u32 filter;
> > +
> > +     /* We always reprogram the counter */
> > +     if (pmu_flags & PERF_EF_RELOAD)
> > +             WARN_ON(!(hwc->state & PERF_HES_UPTODATE));
> > +
> > +     arm_cspmu_set_event_period(event);
> > +
> > +     filter = cspmu->impl.ops.event_filter(event);
> > +
> > +     if (event->hw.extra_reg.idx == cspmu->cycle_counter_logical_idx) {
> > +             arm_cspmu_set_cc_filter(cspmu, filter);
> > +     } else {
> > +             arm_cspmu_set_event(cspmu, hwc);
> > +             arm_cspmu_set_ev_filter(cspmu, hwc, filter);
> > +     }
> > +
> > +     hwc->state = 0;
> > +
> > +     arm_cspmu_enable_counter(cspmu, hwc->idx);
> > +}
> > +
> > +static void arm_cspmu_stop(struct perf_event *event, int pmu_flags)
> > +{
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +     struct hw_perf_event *hwc = &event->hw;
> > +
> > +     if (hwc->state & PERF_HES_STOPPED)
> > +             return;
> > +
> > +     arm_cspmu_disable_counter(cspmu, hwc->idx);
> > +     arm_cspmu_event_update(event);
> > +
> > +     hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
> > +}
> > +
> > +static inline u32 to_phys_idx(struct arm_cspmu *cspmu, u32 idx)
> > +{
> > +     return (idx == cspmu->cycle_counter_logical_idx) ?
> > +             ARM_CSPMU_CYCLE_CNTR_IDX : idx;
> > +}
> > +
> > +static int arm_cspmu_add(struct perf_event *event, int flags)
> > +{
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +     struct arm_cspmu_hw_events *hw_events = &cspmu->hw_events;
> > +     struct hw_perf_event *hwc = &event->hw;
> > +     int idx;
> > +
> > +     if (WARN_ON_ONCE(!cpumask_test_cpu(smp_processor_id(),
> > +                                        &cspmu->associated_cpus)))
> > +             return -ENOENT;
> > +
> > +     idx = arm_cspmu_get_event_idx(hw_events, event);
> > +     if (idx < 0)
> > +             return idx;
> > +
> > +     hw_events->events[idx] = event;
> > +     hwc->idx = to_phys_idx(cspmu, idx);
> > +     hwc->extra_reg.idx = idx;
> > +     hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> > +
> > +     if (flags & PERF_EF_START)
> > +             arm_cspmu_start(event, PERF_EF_RELOAD);
> > +
> > +     /* Propagate changes to the userspace mapping. */
> > +     perf_event_update_userpage(event);
> > +
> > +     return 0;
> > +}
> > +
> > +static void arm_cspmu_del(struct perf_event *event, int flags)
> > +{
> > +     struct arm_cspmu *cspmu = to_arm_cspmu(event->pmu);
> > +     struct arm_cspmu_hw_events *hw_events = &cspmu->hw_events;
> > +     struct hw_perf_event *hwc = &event->hw;
> > +     int idx = hwc->extra_reg.idx;
> > +
> > +     arm_cspmu_stop(event, PERF_EF_UPDATE);
> > +
> > +     hw_events->events[idx] = NULL;
> > +
> > +     clear_bit(idx, hw_events->used_ctrs);
> > +
> > +     perf_event_update_userpage(event);
> > +}
> > +
> > +static void arm_cspmu_read(struct perf_event *event)
> > +{
> > +     arm_cspmu_event_update(event);
> > +}
> > +
> > +static struct arm_cspmu *arm_cspmu_alloc(struct platform_device
> *pdev)
> > +{
> > +     struct acpi_apmt_node *apmt_node;
> > +     struct arm_cspmu *cspmu;
> > +     struct device *dev;
> > +
> > +     dev = &pdev->dev;
> > +     apmt_node = *(struct acpi_apmt_node **)dev_get_platdata(dev);
> > +     if (!apmt_node) {
> > +             dev_err(dev, "failed to get APMT node\n");
> > +             return NULL;
> > +     }
> > +
> > +     cspmu = devm_kzalloc(dev, sizeof(*cspmu), GFP_KERNEL);
> > +     if (!cspmu)
> > +             return NULL;
> > +
> > +     cspmu->dev = dev;
> > +     cspmu->apmt_node = apmt_node;
> > +
> > +     platform_set_drvdata(pdev, cspmu);
> > +
> > +     return cspmu;
> > +}
> > +
> > +static int arm_cspmu_init_mmio(struct arm_cspmu *cspmu)
> > +{
> > +     struct device *dev;
> > +     struct platform_device *pdev;
> > +     struct acpi_apmt_node *apmt_node;
> > +
> > +     dev = cspmu->dev;
> > +     pdev = to_platform_device(dev);
> > +     apmt_node = cspmu->apmt_node;
> > +
> > +     /* Base address for page 0. */
> > +     cspmu->base0 = devm_platform_ioremap_resource(pdev, 0);
> > +     if (IS_ERR(cspmu->base0)) {
> > +             dev_err(dev, "ioremap failed for page-0 resource\n");
> > +             return PTR_ERR(cspmu->base0);
> > +     }
> > +
> > +     /* Base address for page 1 if supported. Otherwise point to page 0. */
> > +     cspmu->base1 = cspmu->base0;
> > +     if (CHECK_APMT_FLAG(apmt_node->flags, DUAL_PAGE, SUPP)) {
> > +             cspmu->base1 = devm_platform_ioremap_resource(pdev, 1);
> > +             if (IS_ERR(cspmu->base1)) {
> > +                     dev_err(dev, "ioremap failed for page-1 resource\n");
> > +                     return PTR_ERR(cspmu->base1);
> > +             }
> > +     }
> > +
> > +     cspmu->pmcfgr = readl(cspmu->base0 + PMCFGR);
> > +
> > +     cspmu->num_logical_ctrs = FIELD_GET(PMCFGR_N, cspmu->pmcfgr) +
> 1;
> > +
> > +     cspmu->cycle_counter_logical_idx = ARM_CSPMU_MAX_HW_CNTRS;
> > +
> > +     if (supports_cycle_counter(cspmu)) {
> > +             /*
> > +              * The last logical counter is mapped to cycle counter if
> > +              * there is a gap between regular and cycle counter. Otherwise,
> > +              * logical and physical have 1-to-1 mapping.
> > +              */
> > +             cspmu->cycle_counter_logical_idx =
> > +                     (cspmu->num_logical_ctrs <= ARM_CSPMU_CYCLE_CNTR_IDX)
> ?
> > +                             cspmu->num_logical_ctrs - 1 :
> > +                             ARM_CSPMU_CYCLE_CNTR_IDX;
> > +     }
> > +
> > +     cspmu->num_set_clr_reg =
> > +             DIV_ROUND_UP(cspmu->num_logical_ctrs,
> > +                             ARM_CSPMU_SET_CLR_COUNTER_NUM);
> > +
> > +     cspmu->hw_events.events =
> > +             devm_kcalloc(dev, cspmu->num_logical_ctrs,
> > +                          sizeof(*cspmu->hw_events.events), GFP_KERNEL);
> > +
> > +     if (!cspmu->hw_events.events)
> > +             return -ENOMEM;
> > +
> > +     return 0;
> > +}
> > +
> > +static inline int arm_cspmu_get_reset_overflow(struct arm_cspmu
> *cspmu,
> > +                                            u32 *pmovs)
> > +{
> > +     int i;
> > +     u32 pmovclr_offset = PMOVSCLR;
> > +     u32 has_overflowed = 0;
> > +
> > +     for (i = 0; i < cspmu->num_set_clr_reg; ++i) {
> > +             pmovs[i] = readl(cspmu->base1 + pmovclr_offset);
> > +             has_overflowed |= pmovs[i];
> > +             writel(pmovs[i], cspmu->base1 + pmovclr_offset);
> > +             pmovclr_offset += sizeof(u32);
> > +     }
> > +
> > +     return has_overflowed != 0;
> > +}
> > +
> > +static irqreturn_t arm_cspmu_handle_irq(int irq_num, void *dev)
> > +{
> > +     int idx, has_overflowed;
> > +     struct perf_event *event;
> > +     struct arm_cspmu *cspmu = dev;
> > +     u32 pmovs[ARM_CSPMU_SET_CLR_MAX_NUM] = { 0 };
> 
> nit: Could we not reuse what we do for hw_events.use_ctrs ?
> 
> i.e, DECLARE_BITMAP(pmovs, ARM_CSPMU_MAX_HW_CNTRS)
> 
> 
> And remove ARM_CSPMU_SET_CLR_MAX_NUM altogether and the cast
> below
> to (unsigned long *).
> 

Sure, I will update the patch with your suggestion.

> With that
> 
> Reviewed-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 

Thanks!

> Suzuki

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
  2022-09-27 11:42   ` Suzuki K Poulose
@ 2022-09-28  1:38     ` Besar Wicaksono
  2022-09-28 10:47       ` Suzuki K Poulose
  0 siblings, 1 reply; 13+ messages in thread
From: Besar Wicaksono @ 2022-09-28  1:38 UTC (permalink / raw)
  To: Suzuki K Poulose, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi, mathieu.poirier, mike.leach,
	leo.yan



> -----Original Message-----
> From: Suzuki K Poulose <suzuki.poulose@arm.com>
> Sent: Tuesday, September 27, 2022 6:43 AM
> To: Besar Wicaksono <bwicaksono@nvidia.com>; robin.murphy@arm.com;
> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com
> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
> linux-tegra@vger.kernel.org; sudeep.holla@arm.com;
> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
> Sethi <vsethi@nvidia.com>; mathieu.poirier@linaro.org;
> mike.leach@linaro.org; leo.yan@linaro.org
> Subject: Re: [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF
> and MCF attribute
> 
> External email: Use caution opening links or attachments
> 
> 
> On 14/08/2022 19:23, Besar Wicaksono wrote:
> > Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
> > Fabric (MCF) PMU attributes for CoreSight PMU implementation in
> > NVIDIA devices.
> >
> > Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>
> > ---
> >   Documentation/admin-guide/perf/index.rst      |   1 +
> >   Documentation/admin-guide/perf/nvidia-pmu.rst | 120 ++++++
> >   drivers/perf/arm_cspmu/Makefile               |   3 +-
> >   drivers/perf/arm_cspmu/arm_cspmu.c            |   7 +
> >   drivers/perf/arm_cspmu/nvidia_cspmu.c         | 367
> ++++++++++++++++++
> >   drivers/perf/arm_cspmu/nvidia_cspmu.h         |  17 +
> >   6 files changed, 514 insertions(+), 1 deletion(-)
> >   create mode 100644 Documentation/admin-guide/perf/nvidia-pmu.rst
> >   create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.c
> >   create mode 100644 drivers/perf/arm_cspmu/nvidia_cspmu.h
> >
> > diff --git a/Documentation/admin-guide/perf/index.rst
> b/Documentation/admin-guide/perf/index.rst
> > index 69b23f087c05..cf05fed1f67f 100644
> > --- a/Documentation/admin-guide/perf/index.rst
> > +++ b/Documentation/admin-guide/perf/index.rst
> > @@ -17,3 +17,4 @@ Performance monitor support
> >      xgene-pmu
> >      arm_dsu_pmu
> >      thunderx2-pmu
> > +   nvidia-pmu
> > diff --git a/Documentation/admin-guide/perf/nvidia-pmu.rst
> b/Documentation/admin-guide/perf/nvidia-pmu.rst
> > new file mode 100644
> > index 000000000000..c41b93965824
> > --- /dev/null
> > +++ b/Documentation/admin-guide/perf/nvidia-pmu.rst
> > @@ -0,0 +1,120 @@
> >
> +=========================================================
> > +NVIDIA Tegra SoC Uncore Performance Monitoring Unit (PMU)
> >
> +=========================================================
> > +
> > +The NVIDIA Tegra SoC includes various system PMUs to measure key
> performance
> > +metrics like memory bandwidth, latency, and utilization:
> > +
> > +* Scalable Coherency Fabric (SCF)
> > +* Memory Controller Fabric (MCF) GPU physical interface
> > +* MCF GPU virtual interface
> > +* MCF NVLINK interface
> > +* MCF PCIE interface
> > +
> > +PMU Driver
> > +----------
> > +
> > +The PMUs in this document are based on ARM CoreSight PMU
> Architecture as
> > +described in document: ARM IHI 0091. Since this is a standard
> architecture, the
> > +PMUs are managed by a common driver "arm-cs-arch-pmu". This driver
> describes
> > +the available events and configuration of each PMU in sysfs. Please see
> the
> > +sections below to get the sysfs path of each PMU. Like other uncore PMU
> driver,
> > +the driver provides "cpumask" sysfs attribute to show the CPU id used to
> handle
> > +the PMU event. There is also "associated_cpus" sysfs attribute, which
> contains a
> > +list of CPUs associated with the PMU instance.
> > +
> > +SCF PMU
> > +-------
> > +
> > +The SCF PMU monitors system level cache events, CPU traffic, and
> > +strongly-ordered PCIE traffic to local/remote memory.
> > +
> > +The events and configuration options of this PMU device are described in
> sysfs,
> > +see /sys/bus/event_sources/devices/nvidia_scf_pmu_<socket-id>.
> > +
> > +Example usage::
> > +
> > +  perf stat -a -e nvidia_scf_pmu_0/config=0x0/
> > +
> > +This will count the events in socket 0.
> > +
> > +MCF GPU Physical PMU
> > +--------------------
> > +
> > +The MCF GPU physical PMU monitors ATS translated traffic from GPU to
> > +local/remote memory via Nvlink C2C.
> > +
> > +The events and configuration options of this PMU device are described in
> sysfs,
> > +see /sys/bus/event_sources/devices/nvidia_mcf_gpu_pmu_<socket-id>.
> > +
> > +Multiple GPUs can be connected to the SoC. The user can use "gpu"
> bitmap
> > +parameter to select the GPU(s) to monitor, i.e. "gpu=0xF" corresponds to
> GPU 0
> > +to 3. /sys/bus/event_sources/devices/nvidia_mcf_gpu_pmu_<socket-
> id>/format/gpu
> > +shows the valid bits that can be set in the "gpu" parameter.
> > +
> > +Example usage::
> > +
> > +  perf stat -a -e nvidia_mcf_gpu_pmu_0/config=0x0,gpu=0x3/
> > +
> > +This will count the events on GPU 0 and 1 that are connected to SoC in
> socket 0.
> > +
> > +MCF GPU Virtual PMU
> > +-------------------
> > +
> > +The MCF GPU virtual PMU monitors SMMU inline translated traffic (as
> opposed to
> > +ATS) from GPU to local/remote memory via Nvlink C2C.
> > +
> > +The events and configuration options of this PMU device are described in
> sysfs,
> > +see /sys/bus/event_sources/devices/nvidia_mcf_gpuvir_pmu_<socket-
> id>.
> > +
> > +Multiple GPUs can be connected to the SoC. The user can use "gpu"
> bitmap
> > +parameter to select the GPU(s) to monitor, i.e. "gpu=0xF" corresponds to
> GPU 0
> > +to 3. /sys/bus/event_sources/devices/nvidia_mcf_gpuvir_pmu_<socket-
> id>/format/gpu
> > +shows the valid bits that can be set in the "gpu" parameter.
> > +
> > +Example usage::
> > +
> > +  perf stat -a -e nvidia_mcf_gpuvir_pmu_0/config=0x0,gpu=0x3/
> > +
> > +This will count the events on GPU 0 and 1 that are connected to SoC in
> socket 0.
> > +
> > +MCF NVLINK PMU
> > +--------------
> > +
> > +The MCF NVLINK PMU monitors I/O coherent traffic from external socket
> to local
> > +memory.
> > +
> > +The events and configuration options of this PMU device are described in
> sysfs,
> > +see /sys/bus/event_sources/devices/nvidia_mcf_nvlink_pmu_<socket-
> id>.
> > +
> > +Each SoC socket can be connected to one or more sockets via NVLINK.
> The user can
> > +use "rem_socket" bitmap parameter to select the remote socket(s) to
> monitor,
> > +i.e. "rem_socket=0xE" corresponds to socket 1 to 3.
> > +/sys/bus/event_sources/devices/nvidia_mcf_nvlink_pmu_<socket-
> id>/format/rem_socket
> > +shows the valid bits that can be set in the "rem_socket" parameter.
> > +
> > +Example usage::
> > +
> > +  perf stat -a -e nvidia_mcf_nvlink_pmu_0/config=0x0,rem_socket=0x6/
> > +
> > +This will count the events from remote socket 1 and 2 to socket 0.
> > +
> > +MCF PCIE PMU
> > +------------
> > +
> > +The MCF PCIE PMU monitors traffic from PCIE root ports to local/remote
> memory.
> > +
> > +The events and configuration options of this PMU device are described in
> sysfs,
> > +see /sys/bus/event_sources/devices/nvidia_mcf_pcie_pmu_<socket-
> id>.
> > +
> > +Each SoC socket can support multiple root ports. The user can use
> > +"root_port" bitmap parameter to select the port(s) to monitor, i.e.
> > +"root_port=0xF" corresponds to root port 0 to 3.
> > +/sys/bus/event_sources/devices/nvidia_mcf_pcie_pmu_<socket-
> id>/format/root_port
> > +shows the valid bits that can be set in the "root_port" parameter.
> > +
> > +Example usage::
> > +
> > +  perf stat -a -e nvidia_mcf_pcie_pmu_0/config=0x0,root_port=0x3/
> > +
> > +This will count the events from root port 0 and 1 of socket 0.
> > diff --git a/drivers/perf/arm_cspmu/Makefile
> b/drivers/perf/arm_cspmu/Makefile
> > index cdc3455f74d8..1b586064bd77 100644
> > --- a/drivers/perf/arm_cspmu/Makefile
> > +++ b/drivers/perf/arm_cspmu/Makefile
> > @@ -3,4 +3,5 @@
> >   # SPDX-License-Identifier: GPL-2.0
> >
> >   obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += \
> > -     arm_cspmu.o
> > +     arm_cspmu.o \
> > +     nvidia_cspmu.o
> > diff --git a/drivers/perf/arm_cspmu/arm_cspmu.c
> b/drivers/perf/arm_cspmu/arm_cspmu.c
> > index 410876f86eb0..7a0beb515e53 100644
> > --- a/drivers/perf/arm_cspmu/arm_cspmu.c
> > +++ b/drivers/perf/arm_cspmu/arm_cspmu.c
> > @@ -31,6 +31,7 @@
> >   #include <acpi/processor.h>
> >
> >   #include "arm_cspmu.h"
> > +#include "nvidia_cspmu.h"
> >
> >   #define PMUNAME "arm_cspmu"
> >   #define DRVNAME "arm-cs-arch-pmu"
> > @@ -118,6 +119,9 @@ static_assert(
> >                       ops->callback = arm_cspmu_ ## callback; \
> >       } while (0)
> >
> > +/* JEDEC-assigned JEP106 identification code */
> > +#define ARM_CSPMU_IMPL_ID_NVIDIA             0x36B
> > +
> >   static unsigned long arm_cspmu_cpuhp_state;
> >
> >   /*
> > @@ -369,6 +373,9 @@ struct impl_match {
> >   };
> >
> >   static const struct impl_match impl_match[] = {
> > +     { .pmiidr = ARM_CSPMU_IMPL_ID_NVIDIA,
> > +       .mask = ARM_CSPMU_PMIIDR_IMPLEMENTER,
> > +       .impl_init_ops = nv_cspmu_init_ops },
> 
> Super minor nit: Coding style. Could we use :
> 
>         {
>                 .field = value,
>                 ...
>         },
> 
> >       {}
> >   };
> >
> > diff --git a/drivers/perf/arm_cspmu/nvidia_cspmu.c
> b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> > new file mode 100644
> > index 000000000000..261f20680bc1
> > --- /dev/null
> > +++ b/drivers/perf/arm_cspmu/nvidia_cspmu.c
> > @@ -0,0 +1,367 @@
> > +// SPDX-License-Identifier: GPL-2.0
> > +/*
> > + * Copyright (c) 2022, NVIDIA CORPORATION & AFFILIATES.
> > + *
> > + */
> > +
> > +/* Support for NVIDIA specific attributes. */
> > +
> > +#include "nvidia_cspmu.h"
> > +
> > +#define NV_MCF_PCIE_PORT_COUNT       10ULL
> > +#define NV_MCF_PCIE_FILTER_ID_MASK
> GENMASK_ULL(NV_MCF_PCIE_PORT_COUNT - 1, 0)
> > +
> > +#define NV_MCF_GPU_PORT_COUNT        2ULL
> > +#define NV_MCF_GPU_FILTER_ID_MASK
> GENMASK_ULL(NV_MCF_GPU_PORT_COUNT - 1, 0)
> > +
> > +#define NV_MCF_NVL_PORT_COUNT        4ULL
> > +#define NV_MCF_NVL_FILTER_ID_MASK
> GENMASK_ULL(NV_MCF_NVL_PORT_COUNT - 1, 0)
> > +
> > +#define NV_SCF_MCF_PRODID_MASK       GENMASK(31, 0)
> > +
> > +#define NV_FORMAT_NAME_GENERIC       0
> > +
> > +#define to_nv_cspmu_ctx(cspmu)       ((struct nv_cspmu_ctx *)(cspmu-
> >impl.ctx))
> > +
> > +#define NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _num, _suff, _config)
> \
> > +     ARM_CSPMU_EVENT_ATTR(_pref##_num##_suff, _config)
> > +
> > +#define NV_CSPMU_EVENT_ATTR_4(_pref, _suff, _config)                 \
> > +     NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _0_, _suff, _config),        \
> > +     NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _1_, _suff, _config + 1),    \
> > +     NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _2_, _suff, _config + 2),    \
> > +     NV_CSPMU_EVENT_ATTR_4_INNER(_pref, _3_, _suff, _config + 3)
> > +
> > +struct nv_cspmu_ctx {
> > +     const char *name;
> > +     u32 filter_mask;
> > +     struct attribute **event_attr;
> > +     struct attribute **format_attr;
> > +};
> > +
> > +static struct attribute *scf_pmu_event_attrs[] = {
> > +     ARM_CSPMU_EVENT_ATTR(bus_cycles,                        0x1d),
> > +
> > +     ARM_CSPMU_EVENT_ATTR(scf_cache_allocate,                0xF0),
> > +     ARM_CSPMU_EVENT_ATTR(scf_cache_refill,                  0xF1),
> > +     ARM_CSPMU_EVENT_ATTR(scf_cache,                         0xF2),
> > +     ARM_CSPMU_EVENT_ATTR(scf_cache_wb,                      0xF3),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(socket, rd_data,                  0x101),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, dl_rsp,                   0x105),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, wb_data,                  0x109),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, ev_rsp,                   0x10d),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, prb_data,                 0x111),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(socket, rd_outstanding,           0x115),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, dl_outstanding,           0x119),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, wb_outstanding,           0x11d),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, wr_outstanding,           0x121),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, ev_outstanding,           0x125),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, prb_outstanding,          0x129),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(socket, rd_access,                0x12d),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, dl_access,                0x131),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, wb_access,                0x135),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, wr_access,                0x139),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, ev_access,                0x13d),
> > +     NV_CSPMU_EVENT_ATTR_4(socket, prb_access,               0x141),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_data,                0x145),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_access,              0x149),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_access,              0x14d),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_rd_outstanding,         0x151),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_outstanding,         0x155),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_data,                 0x159),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_access,               0x15d),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_access,               0x161),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_rd_outstanding,          0x165),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_outstanding,          0x169),
> > +
> > +     ARM_CSPMU_EVENT_ATTR(gmem_rd_data,                      0x16d),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_rd_access,                    0x16e),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_rd_outstanding,               0x16f),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_dl_rsp,                       0x170),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_dl_access,                    0x171),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_dl_outstanding,               0x172),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_wb_data,                      0x173),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_wb_access,                    0x174),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_wb_outstanding,               0x175),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_ev_rsp,                       0x176),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_ev_access,                    0x177),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_ev_outstanding,               0x178),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_wr_data,                      0x179),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_wr_outstanding,               0x17a),
> > +     ARM_CSPMU_EVENT_ATTR(gmem_wr_access,                    0x17b),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(socket, wr_data,                  0x17c),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_data,                0x180),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_data,                0x184),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wr_access,              0x188),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, gmem_wb_outstanding,         0x18c),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_data,                 0x190),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_data,                 0x194),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_wr_access,               0x198),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, rem_wb_outstanding,          0x19c),
> > +
> > +     ARM_CSPMU_EVENT_ATTR(gmem_wr_total_bytes,               0x1a0),
> > +     ARM_CSPMU_EVENT_ATTR(remote_socket_wr_total_bytes,
> 0x1a1),
> > +     ARM_CSPMU_EVENT_ATTR(remote_socket_rd_data,             0x1a2),
> > +     ARM_CSPMU_EVENT_ATTR(remote_socket_rd_outstanding,
> 0x1a3),
> > +     ARM_CSPMU_EVENT_ATTR(remote_socket_rd_access,           0x1a4),
> > +
> > +     ARM_CSPMU_EVENT_ATTR(cmem_rd_data,                      0x1a5),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_rd_access,                    0x1a6),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_rd_outstanding,               0x1a7),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_dl_rsp,                       0x1a8),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_dl_access,                    0x1a9),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_dl_outstanding,               0x1aa),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_wb_data,                      0x1ab),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_wb_access,                    0x1ac),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_wb_outstanding,               0x1ad),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_ev_rsp,                       0x1ae),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_ev_access,                    0x1af),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_ev_outstanding,               0x1b0),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_wr_data,                      0x1b1),
> > +     ARM_CSPMU_EVENT_ATTR(cmem_wr_outstanding,               0x1b2),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_data,                0x1b3),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_access,              0x1b7),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_access,              0x1bb),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_rd_outstanding,         0x1bf),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_outstanding,         0x1c3),
> > +
> > +     ARM_CSPMU_EVENT_ATTR(ocu_prb_access,                    0x1c7),
> > +     ARM_CSPMU_EVENT_ATTR(ocu_prb_data,                      0x1c8),
> > +     ARM_CSPMU_EVENT_ATTR(ocu_prb_outstanding,               0x1c9),
> > +
> > +     ARM_CSPMU_EVENT_ATTR(cmem_wr_access,                    0x1ca),
> > +
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_access,              0x1cb),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_data,                0x1cf),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wr_data,                0x1d3),
> > +     NV_CSPMU_EVENT_ATTR_4(ocu, cmem_wb_outstanding,         0x1d7),
> > +
> > +     ARM_CSPMU_EVENT_ATTR(cmem_wr_total_bytes,               0x1db),
> > +
> > +     ARM_CSPMU_EVENT_ATTR(cycles,
> ARM_CSPMU_EVT_CYCLES_DEFAULT),
> > +     NULL,
> > +};
> > +
> > +static struct attribute *mcf_pmu_event_attrs[] = {
> > +     ARM_CSPMU_EVENT_ATTR(rd_bytes_loc,                      0x0),
> > +     ARM_CSPMU_EVENT_ATTR(rd_bytes_rem,                      0x1),
> > +     ARM_CSPMU_EVENT_ATTR(wr_bytes_loc,                      0x2),
> > +     ARM_CSPMU_EVENT_ATTR(wr_bytes_rem,                      0x3),
> > +     ARM_CSPMU_EVENT_ATTR(total_bytes_loc,                   0x4),
> > +     ARM_CSPMU_EVENT_ATTR(total_bytes_rem,                   0x5),
> > +     ARM_CSPMU_EVENT_ATTR(rd_req_loc,                        0x6),
> > +     ARM_CSPMU_EVENT_ATTR(rd_req_rem,                        0x7),
> > +     ARM_CSPMU_EVENT_ATTR(wr_req_loc,                        0x8),
> > +     ARM_CSPMU_EVENT_ATTR(wr_req_rem,                        0x9),
> > +     ARM_CSPMU_EVENT_ATTR(total_req_loc,                     0xa),
> > +     ARM_CSPMU_EVENT_ATTR(total_req_rem,                     0xb),
> > +     ARM_CSPMU_EVENT_ATTR(rd_cum_outs_loc,                   0xc),
> > +     ARM_CSPMU_EVENT_ATTR(rd_cum_outs_rem,                   0xd),
> > +     ARM_CSPMU_EVENT_ATTR(cycles,
> ARM_CSPMU_EVT_CYCLES_DEFAULT),
> > +     NULL,
> > +};
> > +
> > +static struct attribute *generic_pmu_event_attrs[] = {
> > +     ARM_CSPMU_EVENT_ATTR(cycles,
> ARM_CSPMU_EVT_CYCLES_DEFAULT),
> > +     NULL,
> > +};
> > +
> > +static struct attribute *scf_pmu_format_attrs[] = {
> > +     ARM_CSPMU_FORMAT_EVENT_ATTR,
> > +     NULL,
> > +};
> > +
> > +static struct attribute *mcf_pcie_pmu_format_attrs[] = {
> > +     ARM_CSPMU_FORMAT_EVENT_ATTR,
> > +     ARM_CSPMU_FORMAT_ATTR(root_port, "config1:0-9"),
> > +     NULL,
> > +};
> > +
> > +static struct attribute *mcf_gpu_pmu_format_attrs[] = {
> > +     ARM_CSPMU_FORMAT_EVENT_ATTR,
> > +     ARM_CSPMU_FORMAT_ATTR(gpu, "config1:0-1"),
> > +     NULL,
> > +};
> > +
> > +static struct attribute *mcf_nvlink_pmu_format_attrs[] = {
> > +     ARM_CSPMU_FORMAT_EVENT_ATTR,
> > +     ARM_CSPMU_FORMAT_ATTR(rem_socket, "config1:0-3"),
> > +     NULL,
> > +};
> > +
> > +static struct attribute *generic_pmu_format_attrs[] = {
> > +     ARM_CSPMU_FORMAT_EVENT_ATTR,
> > +     ARM_CSPMU_FORMAT_FILTER_ATTR,
> > +     NULL,
> > +};
> > +
> > +static struct attribute **
> > +nv_cspmu_get_event_attrs(const struct arm_cspmu *cspmu)
> > +{
> > +     const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
> > +
> > +     return ctx->event_attr;
> > +}
> > +
> > +static struct attribute **
> > +nv_cspmu_get_format_attrs(const struct arm_cspmu *cspmu)
> > +{
> > +     const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
> > +
> > +     return ctx->format_attr;
> > +}
> > +
> > +static const char *
> > +nv_cspmu_get_name(const struct arm_cspmu *cspmu)
> > +{
> > +     const struct nv_cspmu_ctx *ctx = to_nv_cspmu_ctx(cspmu);
> > +
> > +     return ctx->name;
> > +}
> > +
> > +static u32 nv_cspmu_event_filter(const struct perf_event *event)
> > +{
> > +     const struct nv_cspmu_ctx *ctx =
> > +             to_nv_cspmu_ctx(to_arm_cspmu(event->pmu));
> > +
> > +     return event->attr.config1 & ctx->filter_mask;
> > +}
> > +
> > +enum nv_cspmu_name_fmt {
> > +     NAME_FMT_GENERIC,
> > +     NAME_FMT_PROC
> > +};
> > +
> > +struct nv_cspmu_match {
> > +     u32 prodid;
> > +     u32 prodid_mask;
> > +     u64 filter_mask;
> > +     const char *name_pattern;
> > +     enum nv_cspmu_name_fmt name_fmt;
> > +     struct attribute **event_attr;
> > +     struct attribute **format_attr;
> > +};
> > +
> > +static const struct nv_cspmu_match nv_cspmu_match[] = {
> 
> Similar coding style nit below.
> 

Sure, I will update this.

> 
> Otherwise,
> 
> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>

Thanks!

Unfortunately, we need to update the name of the PMUs and remove 
some of the attributes in NVIDIA implementation. This requires a change
in nvidia_cspmu.c and nvidia-pmu.rst. I hope you are fine if I include this
change on v5 patch.

Regards,
Besar

^ permalink raw reply	[flat|nested] 13+ messages in thread

* RE: [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver
  2022-09-22 13:52   ` Will Deacon
  2022-09-27  3:59     ` Besar Wicaksono
@ 2022-09-28  8:31     ` Michael Williams (ATG)
  1 sibling, 0 replies; 13+ messages in thread
From: Michael Williams (ATG) @ 2022-09-28  8:31 UTC (permalink / raw)
  To: Will Deacon, Besar Wicaksono
  Cc: Suzuki Poulose, Robin Murphy, Catalin Marinas, Mark Rutland,
	linux-arm-kernel, linux-kernel, linux-tegra, Sudeep Holla,
	Thanu Rangarajan, treding, jonathanh, vsethi, mathieu.poirier,
	mike.leach, leo.yan

Hi Will,

> -----Original Message-----
> From: Will Deacon <will@kernel.org>
> Sent: 22 September 2022 14:53
> To: Besar Wicaksono <bwicaksono@nvidia.com>
> Subject: Re: [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight
> PMU driver

[...]

> > +/* Check if PMU supports 64-bit single copy atomic. */ static inline
> > +bool supports_64bit_atomics(const struct arm_cspmu *cspmu) {
> > +	return CHECK_APMT_FLAG(cspmu->apmt_node->flags, ATOMIC, SUPP); }
> 
> Is this just there because the architecture permits it, or are folks
> actually hanging these things off 32-bit MMIO buses on arm64 SoCs?

The CPU PMU is often exposed on the CoreSight APB bus (32-bit), and although this driver wouldn't normally be used to access that PMU, I wouldn't rule out similar legacy APB and AHB interfaces being used for other PMUs. A further issue is that the CoreSight PMU model includes a number of 32-bit control registers.

Since issue H.a there is an alternative 64-bit native PMU interface described in the Arm ARM, which must support 64-bit atomic accesses. You might expect this to also appear in CoreSight PMU at some point soon. That would need some additional updates to this driver because all the registers are now 64 bit, which changes some offsets.

Regards,
Mike.


^ permalink raw reply	[flat|nested] 13+ messages in thread

* Re: [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute
  2022-09-28  1:38     ` Besar Wicaksono
@ 2022-09-28 10:47       ` Suzuki K Poulose
  0 siblings, 0 replies; 13+ messages in thread
From: Suzuki K Poulose @ 2022-09-28 10:47 UTC (permalink / raw)
  To: Besar Wicaksono, robin.murphy, catalin.marinas, will, mark.rutland
  Cc: linux-arm-kernel, linux-kernel, linux-tegra, sudeep.holla,
	thanu.rangarajan, Michael.Williams, Thierry Reding,
	Jonathan Hunter, Vikram Sethi, mathieu.poirier, mike.leach,
	leo.yan

On 28/09/2022 02:38, Besar Wicaksono wrote:
> 
> 
>> -----Original Message-----
>> From: Suzuki K Poulose <suzuki.poulose@arm.com>
>> Sent: Tuesday, September 27, 2022 6:43 AM
>> To: Besar Wicaksono <bwicaksono@nvidia.com>; robin.murphy@arm.com;
>> catalin.marinas@arm.com; will@kernel.org; mark.rutland@arm.com
>> Cc: linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org;
>> linux-tegra@vger.kernel.org; sudeep.holla@arm.com;
>> thanu.rangarajan@arm.com; Michael.Williams@arm.com; Thierry Reding
>> <treding@nvidia.com>; Jonathan Hunter <jonathanh@nvidia.com>; Vikram
>> Sethi <vsethi@nvidia.com>; mathieu.poirier@linaro.org;
>> mike.leach@linaro.org; leo.yan@linaro.org
>> Subject: Re: [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF
>> and MCF attribute
>>
>> External email: Use caution opening links or attachments
>>
>>
>> On 14/08/2022 19:23, Besar Wicaksono wrote:
>>> Add support for NVIDIA System Cache Fabric (SCF) and Memory Control
>>> Fabric (MCF) PMU attributes for CoreSight PMU implementation in
>>> NVIDIA devices.
>>>
>>> Signed-off-by: Besar Wicaksono <bwicaksono@nvidia.com>

>>> +struct nv_cspmu_match {
>>> +     u32 prodid;
>>> +     u32 prodid_mask;
>>> +     u64 filter_mask;
>>> +     const char *name_pattern;
>>> +     enum nv_cspmu_name_fmt name_fmt;
>>> +     struct attribute **event_attr;
>>> +     struct attribute **format_attr;
>>> +};
>>> +
>>> +static const struct nv_cspmu_match nv_cspmu_match[] = {
>>
>> Similar coding style nit below.
>>
> 
> Sure, I will update this.
> 
>>
>> Otherwise,
>>
>> Acked-by: Suzuki K Poulose <suzuki.poulose@arm.com>
> 
> Thanks!
> 
> Unfortunately, we need to update the name of the PMUs and remove
> some of the attributes in NVIDIA implementation. This requires a change
> in nvidia_cspmu.c and nvidia-pmu.rst. I hope you are fine if I include this
> change on v5 patch.

That should be fine.

Suzuki

^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2022-09-28 10:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-08-14 18:23 [PATCH v4 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
2022-08-14 18:23 ` [PATCH v4 1/2] perf: arm_cspmu: Add support for ARM CoreSight PMU driver Besar Wicaksono
2022-09-22 13:52   ` Will Deacon
2022-09-27  3:59     ` Besar Wicaksono
2022-09-28  8:31     ` Michael Williams (ATG)
2022-09-27 11:39   ` Suzuki K Poulose
2022-09-28  1:27     ` Besar Wicaksono
2022-08-14 18:23 ` [PATCH v4 2/2] perf: arm_cspmu: Add support for NVIDIA SCF and MCF attribute Besar Wicaksono
2022-09-27 11:42   ` Suzuki K Poulose
2022-09-28  1:38     ` Besar Wicaksono
2022-09-28 10:47       ` Suzuki K Poulose
2022-08-23 17:24 ` [PATCH v4 0/2] perf: ARM CoreSight PMU support Besar Wicaksono
2022-09-22 13:54   ` Will Deacon

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).