All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-03-09 16:21 ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

This patch series provides access to various counters on the ThunderX SOC.

For details of the uncore implementation see patch #1.

Patches #2-5 add the various ThunderX specific PMUs.

As suggested I've put the files under drivers/perf/uncore. I would
prefer this location over drivers/bus because not all of the uncore
drivers are bus related.

Changes to v1:
- Added NUMA support
- Fixed CPU hotplug by pmu migration
- Moved files to drivers/perf/uncore
- Removed OCX FRC and LNE drivers, these will fit better into a edac driver
- improved comments abount overflow interrupts
- removed max device limit
- trimmed include files

Feedback welcome!
Jan

-------------------------------------------------

Jan Glauber (5):
  arm64/perf: Basic uncore counter support for Cavium ThunderX
  arm64/perf: Cavium ThunderX L2C TAD uncore support
  arm64/perf: Cavium ThunderX L2C CBC uncore support
  arm64/perf: Cavium ThunderX LMC uncore support
  arm64/perf: Cavium ThunderX OCX TLK uncore support

 drivers/perf/Makefile                       |   1 +
 drivers/perf/uncore/Makefile                |   5 +
 drivers/perf/uncore/uncore_cavium.c         | 314 +++++++++++++++
 drivers/perf/uncore/uncore_cavium.h         |  95 +++++
 drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
 drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
 drivers/perf/uncore/uncore_cavium_lmc.c     | 196 +++++++++
 drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
 8 files changed, 1828 insertions(+)
 create mode 100644 drivers/perf/uncore/Makefile
 create mode 100644 drivers/perf/uncore/uncore_cavium.c
 create mode 100644 drivers/perf/uncore/uncore_cavium.h
 create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
 create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
 create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
 create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-03-09 16:21 ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

This patch series provides access to various counters on the ThunderX SOC.

For details of the uncore implementation see patch #1.

Patches #2-5 add the various ThunderX specific PMUs.

As suggested I've put the files under drivers/perf/uncore. I would
prefer this location over drivers/bus because not all of the uncore
drivers are bus related.

Changes to v1:
- Added NUMA support
- Fixed CPU hotplug by pmu migration
- Moved files to drivers/perf/uncore
- Removed OCX FRC and LNE drivers, these will fit better into a edac driver
- improved comments abount overflow interrupts
- removed max device limit
- trimmed include files

Feedback welcome!
Jan

-------------------------------------------------

Jan Glauber (5):
  arm64/perf: Basic uncore counter support for Cavium ThunderX
  arm64/perf: Cavium ThunderX L2C TAD uncore support
  arm64/perf: Cavium ThunderX L2C CBC uncore support
  arm64/perf: Cavium ThunderX LMC uncore support
  arm64/perf: Cavium ThunderX OCX TLK uncore support

 drivers/perf/Makefile                       |   1 +
 drivers/perf/uncore/Makefile                |   5 +
 drivers/perf/uncore/uncore_cavium.c         | 314 +++++++++++++++
 drivers/perf/uncore/uncore_cavium.h         |  95 +++++
 drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
 drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
 drivers/perf/uncore/uncore_cavium_lmc.c     | 196 +++++++++
 drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
 8 files changed, 1828 insertions(+)
 create mode 100644 drivers/perf/uncore/Makefile
 create mode 100644 drivers/perf/uncore/uncore_cavium.c
 create mode 100644 drivers/perf/uncore/uncore_cavium.h
 create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
 create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
 create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
 create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c

-- 
1.9.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/5] arm64/perf: Basic uncore counter support for Cavium ThunderX
  2016-03-09 16:21 ` Jan Glauber
@ 2016-03-09 16:21   ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

Provide "uncore" facilities for different non-CPU performance
counter units. Based on Intel/AMD uncore pmu support.

The uncore drivers cover quite different functionality including
L2 Cache, memory controllers and interconnects.

The uncore PMUs can be found under /sys/bus/event_source/devices.
All counters are exported via sysfs in the corresponding events
files under the PMU directory so the perf tool can list the event names.

There are some points that are special in this implementation:

1) The PMU detection relies on PCI device detection. If a
   matching PCI device is found the PMU is created. The code can deal
   with multiple units of the same type, e.g. more than one memory
   controller.
   Note: There is also a CPUID check to determine the CPU variant,
   this is needed to support different hardware versions that use
   the same PCI IDs.

2) Counters are summarized across different units of the same type
   on one NUMA node.
   For instance L2C TAD 0..7 are presented as a single counter
   (adding the values from TAD 0 to 7). Although losing the ability
   to read a single value the merged values are easier to use.

3) NUMA support. The device node id is used to group devices by node
   so counters on one node can be merged. The NUMA node can be selected
   via a new sysfs node attribute.
   Without NUMA support all devices will be on node 0.

4) All counters are 64 bit wide without overflow interrupts.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/Makefile               |   1 +
 drivers/perf/uncore/Makefile        |   1 +
 drivers/perf/uncore/uncore_cavium.c | 301 ++++++++++++++++++++++++++++++++++++
 drivers/perf/uncore/uncore_cavium.h |  78 ++++++++++
 4 files changed, 381 insertions(+)
 create mode 100644 drivers/perf/uncore/Makefile
 create mode 100644 drivers/perf/uncore/uncore_cavium.c
 create mode 100644 drivers/perf/uncore/uncore_cavium.h

diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index acd2397..61b6084 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_ARM_PMU) += arm_pmu.o
+obj-$(CONFIG_ARCH_THUNDER) += uncore/
diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
new file mode 100644
index 0000000..b9c72c2
--- /dev/null
+++ b/drivers/perf/uncore/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
new file mode 100644
index 0000000..4fd5e45
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -0,0 +1,301 @@
+/*
+ * Cavium Thunder uncore PMU support. Derived from Intel and AMD uncore code.
+ *
+ * Copyright (C) 2015,2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/numa.h>
+#include <linux/cpufeature.h>
+
+#include "uncore_cavium.h"
+
+int thunder_uncore_version;
+
+struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
+{
+	return NULL;
+}
+
+void thunder_uncore_read(struct perf_event *event)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore_unit *unit;
+	u64 prev, new = 0;
+	s64 delta;
+
+	node = get_node(hwc->config, uncore);
+
+	/*
+	 * No counter overflow interrupts so we do not
+	 * have to worry about prev_count changing on us.
+	 */
+	prev = local64_read(&hwc->prev_count);
+
+	/* read counter values from all units on the node */
+	list_for_each_entry(unit, &node->unit_list, entry)
+		new += readq(hwc->event_base + unit->map);
+
+	local64_set(&hwc->prev_count, new);
+	delta = new - prev;
+	local64_add(delta, &event->count);
+}
+
+void thunder_uncore_del(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int i;
+
+	event->pmu->stop(event, PERF_EF_UPDATE);
+
+	/*
+	 * For programmable counters we need to check where we installed it.
+	 * To keep this function generic always test the more complicated
+	 * case (free running counters won't need the loop).
+	 */
+	node = get_node(hwc->config, uncore);
+	for (i = 0; i < node->num_counters; i++) {
+		if (cmpxchg(&node->events[i], event, NULL) == event)
+			break;
+	}
+	hwc->idx = -1;
+}
+
+int thunder_uncore_event_init(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore *uncore;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* we do not support sampling */
+	if (is_sampling_event(event))
+		return -EINVAL;
+
+	/* counters do not have these bits */
+	if (event->attr.exclude_user	||
+	    event->attr.exclude_kernel	||
+	    event->attr.exclude_host	||
+	    event->attr.exclude_guest	||
+	    event->attr.exclude_hv	||
+	    event->attr.exclude_idle)
+		return -EINVAL;
+
+	/* counters are 64 bit wide and without overflow interrupts */
+
+	uncore = event_to_thunder_uncore(event);
+	if (!uncore)
+		return -ENODEV;
+	if (!uncore->event_valid(event->attr.config & UNCORE_EVENT_ID_MASK))
+		return -EINVAL;
+
+	/* check NUMA node */
+	node = get_node(event->attr.config, uncore);
+	if (!node) {
+		pr_debug("Invalid numa node selected\n");
+		return -EINVAL;
+	}
+
+	hwc->config = event->attr.config;
+	hwc->idx = -1;
+	return 0;
+}
+
+/*
+ * Thunder uncore events are independent from CPUs. Provide a cpumask
+ * nevertheless to prevent perf from adding the event per-cpu and just
+ * set the mask to one online CPU. Use the same cpumask for all uncore
+ * devices.
+ */
+static cpumask_t thunder_active_mask;
+
+static ssize_t thunder_uncore_attr_show_cpumask(struct device *dev,
+						struct device_attribute *attr,
+						char *buf)
+{
+	return cpumap_print_to_pagebuf(true, buf, &thunder_active_mask);
+}
+static DEVICE_ATTR(cpumask, S_IRUGO, thunder_uncore_attr_show_cpumask, NULL);
+
+static struct attribute *thunder_uncore_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL,
+};
+
+struct attribute_group thunder_uncore_attr_group = {
+	.attrs = thunder_uncore_attrs,
+};
+
+ssize_t thunder_events_sysfs_show(struct device *dev,
+				  struct device_attribute *attr,
+				  char *page)
+{
+	struct perf_pmu_events_attr *pmu_attr =
+		container_of(attr, struct perf_pmu_events_attr, attr);
+
+	if (pmu_attr->event_str)
+		return sprintf(page, "%s", pmu_attr->event_str);
+
+	return 0;
+}
+
+/* node attribute depending on number of numa nodes */
+static ssize_t node_show(struct device *dev, struct device_attribute *attr, char *page)
+{
+	if (NODES_SHIFT)
+		return sprintf(page, "config:16-%d\n", 16 + NODES_SHIFT - 1);
+	else
+		return sprintf(page, "config:16\n");
+}
+
+struct device_attribute format_attr_node = __ATTR_RO(node);
+
+static int thunder_uncore_pmu_cpu_notifier(struct notifier_block *nb,
+					   unsigned long action, void *data)
+{
+	struct thunder_uncore *uncore = container_of(nb, struct thunder_uncore, cpu_nb);
+	int new_cpu, old_cpu = (long) data;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_DOWN_PREPARE:
+		if (!cpumask_test_and_clear_cpu(old_cpu, &thunder_active_mask))
+			break;
+		new_cpu = cpumask_any_but(cpu_online_mask, old_cpu);
+		if (new_cpu >= nr_cpu_ids)
+			break;
+		perf_pmu_migrate_context(uncore->pmu, old_cpu, new_cpu);
+		cpumask_set_cpu(new_cpu, &thunder_active_mask);
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct thunder_uncore_node *alloc_node(struct thunder_uncore *uncore, int node_id, int counters)
+{
+	struct thunder_uncore_node *node;
+
+	node = kzalloc(sizeof(struct thunder_uncore_node), GFP_KERNEL);
+	if (!node)
+		return NULL;
+	node->num_counters = counters;
+	INIT_LIST_HEAD(&node->unit_list);
+	return node;
+}
+
+int __init thunder_uncore_setup(struct thunder_uncore *uncore, int device_id,
+			 unsigned long offset, unsigned long size,
+			 struct pmu *pmu, int counters)
+{
+	struct thunder_uncore_unit  *unit, *tmp;
+	struct thunder_uncore_node *node;
+	struct pci_dev *pdev = NULL;
+	int ret, node_id, found = 0;
+
+	/* detect PCI devices */
+	do {
+		pdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, device_id, pdev);
+		if (!pdev)
+			break;
+
+		node_id = dev_to_node(&pdev->dev);
+		/*
+		 * -1 without NUMA, set to 0 because we always have at
+		 *  least node 0.
+		 */
+		if (node_id < 0)
+			node_id = 0;
+
+		/* allocate node if necessary */
+		if (!uncore->nodes[node_id])
+			uncore->nodes[node_id] = alloc_node(uncore, node_id, counters);
+
+		node = uncore->nodes[node_id];
+		if (!node) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+
+		unit = kzalloc(sizeof(struct thunder_uncore_unit), GFP_KERNEL);
+		if (!unit) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+
+		unit->pdev = pdev;
+		unit->map = ioremap(pci_resource_start(pdev, 0) + offset, size);
+		list_add(&unit->entry, &node->unit_list);
+		node->nr_units++;
+		found++;
+	} while (1);
+
+	if (!found)
+		return -ENODEV;
+
+	/*
+	 * perf PMU is CPU dependent in difference to our uncore devices.
+	 * Just pick a CPU and migrate away if it goes offline.
+	 */
+	cpumask_set_cpu(smp_processor_id(), &thunder_active_mask);
+
+	uncore->cpu_nb.notifier_call = thunder_uncore_pmu_cpu_notifier;
+	uncore->cpu_nb.priority = CPU_PRI_PERF + 1;
+	ret = register_cpu_notifier(&uncore->cpu_nb);
+	if (ret)
+		goto fail;
+
+	ret = perf_pmu_register(pmu, pmu->name, -1);
+	if (ret)
+		goto fail_pmu;
+
+	uncore->pmu = pmu;
+	return 0;
+
+fail_pmu:
+	unregister_cpu_notifier(&uncore->cpu_nb);
+fail:
+	node_id = 0;
+	while (uncore->nodes[node_id]) {
+		node = uncore->nodes[node_id];
+
+		list_for_each_entry_safe(unit, tmp, &node->unit_list, entry) {
+			if (unit->pdev) {
+				if (unit->map)
+					iounmap(unit->map);
+				pci_dev_put(unit->pdev);
+			}
+			kfree(unit);
+		}
+		kfree(uncore->nodes[node_id]);
+		node_id++;
+	}
+	return ret;
+}
+
+static int __init thunder_uncore_init(void)
+{
+	unsigned long implementor = read_cpuid_implementor();
+	unsigned long part_number = read_cpuid_part_number();
+	u32 variant;
+
+	if (implementor != ARM_CPU_IMP_CAVIUM ||
+	    part_number != CAVIUM_CPU_PART_THUNDERX)
+		return -ENODEV;
+
+	/* detect pass2 which contains different counters */
+	variant = MIDR_VARIANT(read_cpuid_id());
+	if (variant == 1)
+		thunder_uncore_version = 1;
+	pr_info("PMU version: %d\n", thunder_uncore_version);
+
+	return 0;
+}
+late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
new file mode 100644
index 0000000..c799709
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -0,0 +1,78 @@
+#include <linux/perf_event.h>
+#include <linux/pci.h>
+#include <linux/list.h>
+#include <linux/io.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt)     "thunderx_uncore: " fmt
+
+enum uncore_type {
+	NOP_TYPE,
+};
+
+extern int thunder_uncore_version;
+
+#define UNCORE_EVENT_ID_MASK		0xffff
+#define UNCORE_EVENT_ID_SHIFT		16
+
+/* maximum number of parallel hardware counters for all uncore parts */
+#define MAX_COUNTERS			64
+
+struct thunder_uncore_unit {
+	struct list_head entry;
+	void __iomem *map;
+	struct pci_dev *pdev;
+};
+
+struct thunder_uncore_node {
+	int nr_units;
+	int num_counters;
+	struct list_head unit_list;
+	struct perf_event *events[MAX_COUNTERS];
+};
+
+/* generic uncore struct for different pmu types */
+struct thunder_uncore {
+	int type;
+	struct pmu *pmu;
+	int (*event_valid)(u64);
+	struct notifier_block cpu_nb;
+	struct thunder_uncore_node *nodes[MAX_NUMNODES];
+};
+
+#define EVENT_PTR(_id) (&event_attr_##_id.attr.attr)
+
+#define EVENT_ATTR(_name, _val)						   \
+static struct perf_pmu_events_attr event_attr_##_name = {		   \
+	.attr	   = __ATTR(_name, 0444, thunder_events_sysfs_show, NULL), \
+	.event_str = "event=" __stringify(_val),			   \
+}
+
+#define EVENT_ATTR_STR(_name, _str)					   \
+static struct perf_pmu_events_attr event_attr_##_name = {		   \
+	.attr	   = __ATTR(_name, 0444, thunder_events_sysfs_show, NULL), \
+	.event_str = _str,						   \
+}
+
+static inline struct thunder_uncore_node *get_node(u64 config,
+					struct thunder_uncore *uncore)
+{
+	return uncore->nodes[config >> UNCORE_EVENT_ID_SHIFT];
+}
+
+#define get_id(config) (config & UNCORE_EVENT_ID_MASK)
+
+extern struct attribute_group thunder_uncore_attr_group;
+extern struct device_attribute format_attr_node;
+
+/* Prototypes */
+struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
+void thunder_uncore_del(struct perf_event *event, int flags);
+int thunder_uncore_event_init(struct perf_event *event);
+void thunder_uncore_read(struct perf_event *event);
+int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
+			 unsigned long offset, unsigned long size,
+			 struct pmu *pmu, int counters);
+ssize_t thunder_events_sysfs_show(struct device *dev,
+				  struct device_attribute *attr,
+				  char *page);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 1/5] arm64/perf: Basic uncore counter support for Cavium ThunderX
@ 2016-03-09 16:21   ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Provide "uncore" facilities for different non-CPU performance
counter units. Based on Intel/AMD uncore pmu support.

The uncore drivers cover quite different functionality including
L2 Cache, memory controllers and interconnects.

The uncore PMUs can be found under /sys/bus/event_source/devices.
All counters are exported via sysfs in the corresponding events
files under the PMU directory so the perf tool can list the event names.

There are some points that are special in this implementation:

1) The PMU detection relies on PCI device detection. If a
   matching PCI device is found the PMU is created. The code can deal
   with multiple units of the same type, e.g. more than one memory
   controller.
   Note: There is also a CPUID check to determine the CPU variant,
   this is needed to support different hardware versions that use
   the same PCI IDs.

2) Counters are summarized across different units of the same type
   on one NUMA node.
   For instance L2C TAD 0..7 are presented as a single counter
   (adding the values from TAD 0 to 7). Although losing the ability
   to read a single value the merged values are easier to use.

3) NUMA support. The device node id is used to group devices by node
   so counters on one node can be merged. The NUMA node can be selected
   via a new sysfs node attribute.
   Without NUMA support all devices will be on node 0.

4) All counters are 64 bit wide without overflow interrupts.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/Makefile               |   1 +
 drivers/perf/uncore/Makefile        |   1 +
 drivers/perf/uncore/uncore_cavium.c | 301 ++++++++++++++++++++++++++++++++++++
 drivers/perf/uncore/uncore_cavium.h |  78 ++++++++++
 4 files changed, 381 insertions(+)
 create mode 100644 drivers/perf/uncore/Makefile
 create mode 100644 drivers/perf/uncore/uncore_cavium.c
 create mode 100644 drivers/perf/uncore/uncore_cavium.h

diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index acd2397..61b6084 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -1 +1,2 @@
 obj-$(CONFIG_ARM_PMU) += arm_pmu.o
+obj-$(CONFIG_ARCH_THUNDER) += uncore/
diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
new file mode 100644
index 0000000..b9c72c2
--- /dev/null
+++ b/drivers/perf/uncore/Makefile
@@ -0,0 +1 @@
+obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
new file mode 100644
index 0000000..4fd5e45
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -0,0 +1,301 @@
+/*
+ * Cavium Thunder uncore PMU support. Derived from Intel and AMD uncore code.
+ *
+ * Copyright (C) 2015,2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/numa.h>
+#include <linux/cpufeature.h>
+
+#include "uncore_cavium.h"
+
+int thunder_uncore_version;
+
+struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
+{
+	return NULL;
+}
+
+void thunder_uncore_read(struct perf_event *event)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore_unit *unit;
+	u64 prev, new = 0;
+	s64 delta;
+
+	node = get_node(hwc->config, uncore);
+
+	/*
+	 * No counter overflow interrupts so we do not
+	 * have to worry about prev_count changing on us.
+	 */
+	prev = local64_read(&hwc->prev_count);
+
+	/* read counter values from all units on the node */
+	list_for_each_entry(unit, &node->unit_list, entry)
+		new += readq(hwc->event_base + unit->map);
+
+	local64_set(&hwc->prev_count, new);
+	delta = new - prev;
+	local64_add(delta, &event->count);
+}
+
+void thunder_uncore_del(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int i;
+
+	event->pmu->stop(event, PERF_EF_UPDATE);
+
+	/*
+	 * For programmable counters we need to check where we installed it.
+	 * To keep this function generic always test the more complicated
+	 * case (free running counters won't need the loop).
+	 */
+	node = get_node(hwc->config, uncore);
+	for (i = 0; i < node->num_counters; i++) {
+		if (cmpxchg(&node->events[i], event, NULL) == event)
+			break;
+	}
+	hwc->idx = -1;
+}
+
+int thunder_uncore_event_init(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore *uncore;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* we do not support sampling */
+	if (is_sampling_event(event))
+		return -EINVAL;
+
+	/* counters do not have these bits */
+	if (event->attr.exclude_user	||
+	    event->attr.exclude_kernel	||
+	    event->attr.exclude_host	||
+	    event->attr.exclude_guest	||
+	    event->attr.exclude_hv	||
+	    event->attr.exclude_idle)
+		return -EINVAL;
+
+	/* counters are 64 bit wide and without overflow interrupts */
+
+	uncore = event_to_thunder_uncore(event);
+	if (!uncore)
+		return -ENODEV;
+	if (!uncore->event_valid(event->attr.config & UNCORE_EVENT_ID_MASK))
+		return -EINVAL;
+
+	/* check NUMA node */
+	node = get_node(event->attr.config, uncore);
+	if (!node) {
+		pr_debug("Invalid numa node selected\n");
+		return -EINVAL;
+	}
+
+	hwc->config = event->attr.config;
+	hwc->idx = -1;
+	return 0;
+}
+
+/*
+ * Thunder uncore events are independent from CPUs. Provide a cpumask
+ * nevertheless to prevent perf from adding the event per-cpu and just
+ * set the mask to one online CPU. Use the same cpumask for all uncore
+ * devices.
+ */
+static cpumask_t thunder_active_mask;
+
+static ssize_t thunder_uncore_attr_show_cpumask(struct device *dev,
+						struct device_attribute *attr,
+						char *buf)
+{
+	return cpumap_print_to_pagebuf(true, buf, &thunder_active_mask);
+}
+static DEVICE_ATTR(cpumask, S_IRUGO, thunder_uncore_attr_show_cpumask, NULL);
+
+static struct attribute *thunder_uncore_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL,
+};
+
+struct attribute_group thunder_uncore_attr_group = {
+	.attrs = thunder_uncore_attrs,
+};
+
+ssize_t thunder_events_sysfs_show(struct device *dev,
+				  struct device_attribute *attr,
+				  char *page)
+{
+	struct perf_pmu_events_attr *pmu_attr =
+		container_of(attr, struct perf_pmu_events_attr, attr);
+
+	if (pmu_attr->event_str)
+		return sprintf(page, "%s", pmu_attr->event_str);
+
+	return 0;
+}
+
+/* node attribute depending on number of numa nodes */
+static ssize_t node_show(struct device *dev, struct device_attribute *attr, char *page)
+{
+	if (NODES_SHIFT)
+		return sprintf(page, "config:16-%d\n", 16 + NODES_SHIFT - 1);
+	else
+		return sprintf(page, "config:16\n");
+}
+
+struct device_attribute format_attr_node = __ATTR_RO(node);
+
+static int thunder_uncore_pmu_cpu_notifier(struct notifier_block *nb,
+					   unsigned long action, void *data)
+{
+	struct thunder_uncore *uncore = container_of(nb, struct thunder_uncore, cpu_nb);
+	int new_cpu, old_cpu = (long) data;
+
+	switch (action & ~CPU_TASKS_FROZEN) {
+	case CPU_DOWN_PREPARE:
+		if (!cpumask_test_and_clear_cpu(old_cpu, &thunder_active_mask))
+			break;
+		new_cpu = cpumask_any_but(cpu_online_mask, old_cpu);
+		if (new_cpu >= nr_cpu_ids)
+			break;
+		perf_pmu_migrate_context(uncore->pmu, old_cpu, new_cpu);
+		cpumask_set_cpu(new_cpu, &thunder_active_mask);
+		break;
+	default:
+		break;
+	}
+	return NOTIFY_OK;
+}
+
+static struct thunder_uncore_node *alloc_node(struct thunder_uncore *uncore, int node_id, int counters)
+{
+	struct thunder_uncore_node *node;
+
+	node = kzalloc(sizeof(struct thunder_uncore_node), GFP_KERNEL);
+	if (!node)
+		return NULL;
+	node->num_counters = counters;
+	INIT_LIST_HEAD(&node->unit_list);
+	return node;
+}
+
+int __init thunder_uncore_setup(struct thunder_uncore *uncore, int device_id,
+			 unsigned long offset, unsigned long size,
+			 struct pmu *pmu, int counters)
+{
+	struct thunder_uncore_unit  *unit, *tmp;
+	struct thunder_uncore_node *node;
+	struct pci_dev *pdev = NULL;
+	int ret, node_id, found = 0;
+
+	/* detect PCI devices */
+	do {
+		pdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, device_id, pdev);
+		if (!pdev)
+			break;
+
+		node_id = dev_to_node(&pdev->dev);
+		/*
+		 * -1 without NUMA, set to 0 because we always have at
+		 *  least node 0.
+		 */
+		if (node_id < 0)
+			node_id = 0;
+
+		/* allocate node if necessary */
+		if (!uncore->nodes[node_id])
+			uncore->nodes[node_id] = alloc_node(uncore, node_id, counters);
+
+		node = uncore->nodes[node_id];
+		if (!node) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+
+		unit = kzalloc(sizeof(struct thunder_uncore_unit), GFP_KERNEL);
+		if (!unit) {
+			ret = -ENOMEM;
+			goto fail;
+		}
+
+		unit->pdev = pdev;
+		unit->map = ioremap(pci_resource_start(pdev, 0) + offset, size);
+		list_add(&unit->entry, &node->unit_list);
+		node->nr_units++;
+		found++;
+	} while (1);
+
+	if (!found)
+		return -ENODEV;
+
+	/*
+	 * perf PMU is CPU dependent in difference to our uncore devices.
+	 * Just pick a CPU and migrate away if it goes offline.
+	 */
+	cpumask_set_cpu(smp_processor_id(), &thunder_active_mask);
+
+	uncore->cpu_nb.notifier_call = thunder_uncore_pmu_cpu_notifier;
+	uncore->cpu_nb.priority = CPU_PRI_PERF + 1;
+	ret = register_cpu_notifier(&uncore->cpu_nb);
+	if (ret)
+		goto fail;
+
+	ret = perf_pmu_register(pmu, pmu->name, -1);
+	if (ret)
+		goto fail_pmu;
+
+	uncore->pmu = pmu;
+	return 0;
+
+fail_pmu:
+	unregister_cpu_notifier(&uncore->cpu_nb);
+fail:
+	node_id = 0;
+	while (uncore->nodes[node_id]) {
+		node = uncore->nodes[node_id];
+
+		list_for_each_entry_safe(unit, tmp, &node->unit_list, entry) {
+			if (unit->pdev) {
+				if (unit->map)
+					iounmap(unit->map);
+				pci_dev_put(unit->pdev);
+			}
+			kfree(unit);
+		}
+		kfree(uncore->nodes[node_id]);
+		node_id++;
+	}
+	return ret;
+}
+
+static int __init thunder_uncore_init(void)
+{
+	unsigned long implementor = read_cpuid_implementor();
+	unsigned long part_number = read_cpuid_part_number();
+	u32 variant;
+
+	if (implementor != ARM_CPU_IMP_CAVIUM ||
+	    part_number != CAVIUM_CPU_PART_THUNDERX)
+		return -ENODEV;
+
+	/* detect pass2 which contains different counters */
+	variant = MIDR_VARIANT(read_cpuid_id());
+	if (variant == 1)
+		thunder_uncore_version = 1;
+	pr_info("PMU version: %d\n", thunder_uncore_version);
+
+	return 0;
+}
+late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
new file mode 100644
index 0000000..c799709
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -0,0 +1,78 @@
+#include <linux/perf_event.h>
+#include <linux/pci.h>
+#include <linux/list.h>
+#include <linux/io.h>
+
+#undef pr_fmt
+#define pr_fmt(fmt)     "thunderx_uncore: " fmt
+
+enum uncore_type {
+	NOP_TYPE,
+};
+
+extern int thunder_uncore_version;
+
+#define UNCORE_EVENT_ID_MASK		0xffff
+#define UNCORE_EVENT_ID_SHIFT		16
+
+/* maximum number of parallel hardware counters for all uncore parts */
+#define MAX_COUNTERS			64
+
+struct thunder_uncore_unit {
+	struct list_head entry;
+	void __iomem *map;
+	struct pci_dev *pdev;
+};
+
+struct thunder_uncore_node {
+	int nr_units;
+	int num_counters;
+	struct list_head unit_list;
+	struct perf_event *events[MAX_COUNTERS];
+};
+
+/* generic uncore struct for different pmu types */
+struct thunder_uncore {
+	int type;
+	struct pmu *pmu;
+	int (*event_valid)(u64);
+	struct notifier_block cpu_nb;
+	struct thunder_uncore_node *nodes[MAX_NUMNODES];
+};
+
+#define EVENT_PTR(_id) (&event_attr_##_id.attr.attr)
+
+#define EVENT_ATTR(_name, _val)						   \
+static struct perf_pmu_events_attr event_attr_##_name = {		   \
+	.attr	   = __ATTR(_name, 0444, thunder_events_sysfs_show, NULL), \
+	.event_str = "event=" __stringify(_val),			   \
+}
+
+#define EVENT_ATTR_STR(_name, _str)					   \
+static struct perf_pmu_events_attr event_attr_##_name = {		   \
+	.attr	   = __ATTR(_name, 0444, thunder_events_sysfs_show, NULL), \
+	.event_str = _str,						   \
+}
+
+static inline struct thunder_uncore_node *get_node(u64 config,
+					struct thunder_uncore *uncore)
+{
+	return uncore->nodes[config >> UNCORE_EVENT_ID_SHIFT];
+}
+
+#define get_id(config) (config & UNCORE_EVENT_ID_MASK)
+
+extern struct attribute_group thunder_uncore_attr_group;
+extern struct device_attribute format_attr_node;
+
+/* Prototypes */
+struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
+void thunder_uncore_del(struct perf_event *event, int flags);
+int thunder_uncore_event_init(struct perf_event *event);
+void thunder_uncore_read(struct perf_event *event);
+int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
+			 unsigned long offset, unsigned long size,
+			 struct pmu *pmu, int counters);
+ssize_t thunder_events_sysfs_show(struct device *dev,
+				  struct device_attribute *attr,
+				  char *page);
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/5] arm64/perf: Cavium ThunderX L2C TAD uncore support
  2016-03-09 16:21 ` Jan Glauber
@ 2016-03-09 16:21   ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

Support counters of the L2 Cache tag and data units.

Also support pass2 added/modified counters by checking MIDR.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/uncore/Makefile                |   3 +-
 drivers/perf/uncore/uncore_cavium.c         |   6 +-
 drivers/perf/uncore/uncore_cavium.h         |   7 +-
 drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
 4 files changed, 613 insertions(+), 3 deletions(-)
 create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c

diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index b9c72c2..6a16caf 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1 +1,2 @@
-obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
+obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
+			      uncore_cavium_l2c_tad.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index 4fd5e45..b92b2ae 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -15,7 +15,10 @@ int thunder_uncore_version;
 
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
 {
-	return NULL;
+	if (event->pmu->type == thunder_l2c_tad_pmu.type)
+		return thunder_uncore_l2c_tad;
+	else
+		return NULL;
 }
 
 void thunder_uncore_read(struct perf_event *event)
@@ -296,6 +299,7 @@ static int __init thunder_uncore_init(void)
 		thunder_uncore_version = 1;
 	pr_info("PMU version: %d\n", thunder_uncore_version);
 
+	thunder_uncore_l2c_tad_setup();
 	return 0;
 }
 late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index c799709..7a9c367 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -7,7 +7,7 @@
 #define pr_fmt(fmt)     "thunderx_uncore: " fmt
 
 enum uncore_type {
-	NOP_TYPE,
+	L2C_TAD_TYPE,
 };
 
 extern int thunder_uncore_version;
@@ -65,6 +65,9 @@ static inline struct thunder_uncore_node *get_node(u64 config,
 extern struct attribute_group thunder_uncore_attr_group;
 extern struct device_attribute format_attr_node;
 
+extern struct thunder_uncore *thunder_uncore_l2c_tad;
+extern struct pmu thunder_l2c_tad_pmu;
+
 /* Prototypes */
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
 void thunder_uncore_del(struct perf_event *event, int flags);
@@ -76,3 +79,5 @@ int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
 ssize_t thunder_events_sysfs_show(struct device *dev,
 				  struct device_attribute *attr,
 				  char *page);
+
+int thunder_uncore_l2c_tad_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_l2c_tad.c b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
new file mode 100644
index 0000000..c8dc305
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
@@ -0,0 +1,600 @@
+/*
+ * Cavium Thunder uncore PMU support, L2C TAD counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_L2C_TAD
+#define PCI_DEVICE_ID_THUNDER_L2C_TAD	0xa02e
+#endif
+
+#define L2C_TAD_NR_COUNTERS             4
+#define L2C_TAD_CONTROL_OFFSET		0x10000
+#define L2C_TAD_COUNTER_OFFSET		0x100
+
+/* L2C TAD event list */
+#define L2C_TAD_EVENTS_DISABLED		0x00
+
+#define L2C_TAD_EVENT_L2T_HIT		0x01
+#define L2C_TAD_EVENT_L2T_MISS		0x02
+#define L2C_TAD_EVENT_L2T_NOALLOC	0x03
+#define L2C_TAD_EVENT_L2_VIC		0x04
+#define L2C_TAD_EVENT_SC_FAIL		0x05
+#define L2C_TAD_EVENT_SC_PASS		0x06
+#define L2C_TAD_EVENT_LFB_OCC		0x07
+#define L2C_TAD_EVENT_WAIT_LFB		0x08
+#define L2C_TAD_EVENT_WAIT_VAB		0x09
+
+#define L2C_TAD_EVENT_RTG_HIT		0x41
+#define L2C_TAD_EVENT_RTG_MISS		0x42
+#define L2C_TAD_EVENT_L2_RTG_VIC	0x44
+#define L2C_TAD_EVENT_L2_OPEN_OCI	0x48
+
+#define L2C_TAD_EVENT_QD0_IDX		0x80
+#define L2C_TAD_EVENT_QD0_RDAT		0x81
+#define L2C_TAD_EVENT_QD0_BNKS		0x82
+#define L2C_TAD_EVENT_QD0_WDAT		0x83
+
+#define L2C_TAD_EVENT_QD1_IDX		0x90
+#define L2C_TAD_EVENT_QD1_RDAT		0x91
+#define L2C_TAD_EVENT_QD1_BNKS		0x92
+#define L2C_TAD_EVENT_QD1_WDAT		0x93
+
+#define L2C_TAD_EVENT_QD2_IDX		0xa0
+#define L2C_TAD_EVENT_QD2_RDAT		0xa1
+#define L2C_TAD_EVENT_QD2_BNKS		0xa2
+#define L2C_TAD_EVENT_QD2_WDAT		0xa3
+
+#define L2C_TAD_EVENT_QD3_IDX		0xb0
+#define L2C_TAD_EVENT_QD3_RDAT		0xb1
+#define L2C_TAD_EVENT_QD3_BNKS		0xb2
+#define L2C_TAD_EVENT_QD3_WDAT		0xb3
+
+#define L2C_TAD_EVENT_QD4_IDX		0xc0
+#define L2C_TAD_EVENT_QD4_RDAT		0xc1
+#define L2C_TAD_EVENT_QD4_BNKS		0xc2
+#define L2C_TAD_EVENT_QD4_WDAT		0xc3
+
+#define L2C_TAD_EVENT_QD5_IDX		0xd0
+#define L2C_TAD_EVENT_QD5_RDAT		0xd1
+#define L2C_TAD_EVENT_QD5_BNKS		0xd2
+#define L2C_TAD_EVENT_QD5_WDAT		0xd3
+
+#define L2C_TAD_EVENT_QD6_IDX		0xe0
+#define L2C_TAD_EVENT_QD6_RDAT		0xe1
+#define L2C_TAD_EVENT_QD6_BNKS		0xe2
+#define L2C_TAD_EVENT_QD6_WDAT		0xe3
+
+#define L2C_TAD_EVENT_QD7_IDX		0xf0
+#define L2C_TAD_EVENT_QD7_RDAT		0xf1
+#define L2C_TAD_EVENT_QD7_BNKS		0xf2
+#define L2C_TAD_EVENT_QD7_WDAT		0xf3
+
+/* pass2 added/changed event list */
+#define L2C_TAD_EVENT_OPEN_CCPI			0x0a
+#define L2C_TAD_EVENT_LOOKUP			0x40
+#define L2C_TAD_EVENT_LOOKUP_XMC_LCL		0x41
+#define L2C_TAD_EVENT_LOOKUP_XMC_RMT		0x42
+#define L2C_TAD_EVENT_LOOKUP_MIB		0x43
+#define L2C_TAD_EVENT_LOOKUP_ALL		0x44
+#define L2C_TAD_EVENT_TAG_ALC_HIT		0x48
+#define L2C_TAD_EVENT_TAG_ALC_MISS		0x49
+#define L2C_TAD_EVENT_TAG_ALC_NALC		0x4a
+#define L2C_TAD_EVENT_TAG_NALC_HIT		0x4b
+#define L2C_TAD_EVENT_TAG_NALC_MISS		0x4c
+#define L2C_TAD_EVENT_LMC_WR			0x4e
+#define L2C_TAD_EVENT_LMC_SBLKDTY		0x4f
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HIT		0x50
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HITE		0x51
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HITS		0x52
+#define L2C_TAD_EVENT_TAG_ALC_RTG_MISS		0x53
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HIT		0x54
+#define L2C_TAD_EVENT_TAG_NALC_RTG_MISS		0x55
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HITE		0x56
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HITS		0x57
+#define L2C_TAD_EVENT_TAG_ALC_LCL_EVICT		0x58
+#define L2C_TAD_EVENT_TAG_ALC_LCL_CLNVIC	0x59
+#define L2C_TAD_EVENT_TAG_ALC_LCL_DTYVIC	0x5a
+#define L2C_TAD_EVENT_TAG_ALC_RMT_EVICT		0x5b
+#define L2C_TAD_EVENT_TAG_ALC_RMT_VIC		0x5c
+#define L2C_TAD_EVENT_RTG_ALC			0x5d
+#define L2C_TAD_EVENT_RTG_ALC_HIT		0x5e
+#define L2C_TAD_EVENT_RTG_ALC_HITWB		0x5f
+#define L2C_TAD_EVENT_STC_TOTAL			0x60
+#define L2C_TAD_EVENT_STC_TOTAL_FAIL		0x61
+#define L2C_TAD_EVENT_STC_RMT			0x62
+#define L2C_TAD_EVENT_STC_RMT_FAIL		0x63
+#define L2C_TAD_EVENT_STC_LCL			0x64
+#define L2C_TAD_EVENT_STC_LCL_FAIL		0x65
+#define L2C_TAD_EVENT_OCI_RTG_WAIT		0x68
+#define L2C_TAD_EVENT_OCI_FWD_CYC_HIT		0x69
+#define L2C_TAD_EVENT_OCI_FWD_RACE		0x6a
+#define L2C_TAD_EVENT_OCI_HAKS			0x6b
+#define L2C_TAD_EVENT_OCI_FLDX_TAG_E_NODAT	0x6c
+#define L2C_TAD_EVENT_OCI_FLDX_TAG_E_DAT	0x6d
+#define L2C_TAD_EVENT_OCI_RLDD			0x6e
+#define L2C_TAD_EVENT_OCI_RLDD_PEMD		0x6f
+#define L2C_TAD_EVENT_OCI_RRQ_DAT_CNT		0x70
+#define L2C_TAD_EVENT_OCI_RRQ_DAT_DMASK		0x71
+#define L2C_TAD_EVENT_OCI_RSP_DAT_CNT		0x72
+#define L2C_TAD_EVENT_OCI_RSP_DAT_DMASK		0x73
+#define L2C_TAD_EVENT_OCI_RSP_DAT_VICD_CNT	0x74
+#define L2C_TAD_EVENT_OCI_RSP_DAT_VICD_DMASK	0x75
+#define L2C_TAD_EVENT_OCI_RTG_ALC_EVICT		0x76
+#define L2C_TAD_EVENT_OCI_RTG_ALC_VIC		0x77
+
+struct thunder_uncore *thunder_uncore_l2c_tad;
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore_unit *unit;
+	u64 prev;
+	int id;
+
+	node = get_node(hwc->config, uncore);
+	id = get_id(hwc->config);
+
+	/* restore counter value divided by units into all counters */
+	if (flags & PERF_EF_RELOAD) {
+		prev = local64_read(&hwc->prev_count);
+		prev = prev / node->nr_units;
+
+		list_for_each_entry(unit, &node->unit_list, entry)
+			writeq(prev, hwc->event_base + unit->map);
+	}
+
+	hwc->state = 0;
+
+	/* write byte in control registers for all units on the node */
+	list_for_each_entry(unit, &node->unit_list, entry)
+		writeb(id, hwc->config_base + unit->map);
+
+	perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore_unit *unit;
+
+	/* reset selection value for all units on the node */
+	node = get_node(hwc->config, uncore);
+
+	list_for_each_entry(unit, &node->unit_list, entry)
+		writeb(L2C_TAD_EVENTS_DISABLED, hwc->config_base + unit->map);
+	hwc->state |= PERF_HES_STOPPED;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		thunder_uncore_read(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int i;
+
+	WARN_ON_ONCE(!uncore);
+	node = get_node(hwc->config, uncore);
+
+	/* are we already assigned? */
+	if (hwc->idx != -1 && node->events[hwc->idx] == event)
+		goto out;
+
+	for (i = 0; i < node->num_counters; i++) {
+		if (node->events[i] == event) {
+			hwc->idx = i;
+			goto out;
+		}
+	}
+
+	/* if not take the first available counter */
+	hwc->idx = -1;
+	for (i = 0; i < node->num_counters; i++) {
+		if (cmpxchg(&node->events[i], NULL, event) == NULL) {
+			hwc->idx = i;
+			break;
+		}
+	}
+out:
+	if (hwc->idx == -1)
+		return -EBUSY;
+
+	hwc->config_base = hwc->idx;
+	hwc->event_base = L2C_TAD_COUNTER_OFFSET +
+			  hwc->idx * sizeof(unsigned long long);
+	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+	if (flags & PERF_EF_START)
+		thunder_uncore_start(event, PERF_EF_RELOAD);
+	return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-7");
+
+static struct attribute *thunder_l2c_tad_format_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_node.attr,
+	NULL,
+};
+
+static struct attribute_group thunder_l2c_tad_format_group = {
+	.name = "format",
+	.attrs = thunder_l2c_tad_format_attr,
+};
+
+EVENT_ATTR(l2t_hit,	L2C_TAD_EVENT_L2T_HIT);
+EVENT_ATTR(l2t_miss,	L2C_TAD_EVENT_L2T_MISS);
+EVENT_ATTR(l2t_noalloc,	L2C_TAD_EVENT_L2T_NOALLOC);
+EVENT_ATTR(l2_vic,	L2C_TAD_EVENT_L2_VIC);
+EVENT_ATTR(sc_fail,	L2C_TAD_EVENT_SC_FAIL);
+EVENT_ATTR(sc_pass,	L2C_TAD_EVENT_SC_PASS);
+EVENT_ATTR(lfb_occ,	L2C_TAD_EVENT_LFB_OCC);
+EVENT_ATTR(wait_lfb,	L2C_TAD_EVENT_WAIT_LFB);
+EVENT_ATTR(wait_vab,	L2C_TAD_EVENT_WAIT_VAB);
+EVENT_ATTR(rtg_hit,	L2C_TAD_EVENT_RTG_HIT);
+EVENT_ATTR(rtg_miss,	L2C_TAD_EVENT_RTG_MISS);
+EVENT_ATTR(l2_rtg_vic,	L2C_TAD_EVENT_L2_RTG_VIC);
+EVENT_ATTR(l2_open_oci,	L2C_TAD_EVENT_L2_OPEN_OCI);
+
+EVENT_ATTR(qd0_idx,	L2C_TAD_EVENT_QD0_IDX);
+EVENT_ATTR(qd0_rdat,	L2C_TAD_EVENT_QD0_RDAT);
+EVENT_ATTR(qd0_bnks,	L2C_TAD_EVENT_QD0_BNKS);
+EVENT_ATTR(qd0_wdat,	L2C_TAD_EVENT_QD0_WDAT);
+
+EVENT_ATTR(qd1_idx,	L2C_TAD_EVENT_QD1_IDX);
+EVENT_ATTR(qd1_rdat,	L2C_TAD_EVENT_QD1_RDAT);
+EVENT_ATTR(qd1_bnks,	L2C_TAD_EVENT_QD1_BNKS);
+EVENT_ATTR(qd1_wdat,	L2C_TAD_EVENT_QD1_WDAT);
+
+EVENT_ATTR(qd2_idx,	L2C_TAD_EVENT_QD2_IDX);
+EVENT_ATTR(qd2_rdat,	L2C_TAD_EVENT_QD2_RDAT);
+EVENT_ATTR(qd2_bnks,	L2C_TAD_EVENT_QD2_BNKS);
+EVENT_ATTR(qd2_wdat,	L2C_TAD_EVENT_QD2_WDAT);
+
+EVENT_ATTR(qd3_idx,	L2C_TAD_EVENT_QD3_IDX);
+EVENT_ATTR(qd3_rdat,	L2C_TAD_EVENT_QD3_RDAT);
+EVENT_ATTR(qd3_bnks,	L2C_TAD_EVENT_QD3_BNKS);
+EVENT_ATTR(qd3_wdat,	L2C_TAD_EVENT_QD3_WDAT);
+
+EVENT_ATTR(qd4_idx,	L2C_TAD_EVENT_QD4_IDX);
+EVENT_ATTR(qd4_rdat,	L2C_TAD_EVENT_QD4_RDAT);
+EVENT_ATTR(qd4_bnks,	L2C_TAD_EVENT_QD4_BNKS);
+EVENT_ATTR(qd4_wdat,	L2C_TAD_EVENT_QD4_WDAT);
+
+EVENT_ATTR(qd5_idx,	L2C_TAD_EVENT_QD5_IDX);
+EVENT_ATTR(qd5_rdat,	L2C_TAD_EVENT_QD5_RDAT);
+EVENT_ATTR(qd5_bnks,	L2C_TAD_EVENT_QD5_BNKS);
+EVENT_ATTR(qd5_wdat,	L2C_TAD_EVENT_QD5_WDAT);
+
+EVENT_ATTR(qd6_idx,	L2C_TAD_EVENT_QD6_IDX);
+EVENT_ATTR(qd6_rdat,	L2C_TAD_EVENT_QD6_RDAT);
+EVENT_ATTR(qd6_bnks,	L2C_TAD_EVENT_QD6_BNKS);
+EVENT_ATTR(qd6_wdat,	L2C_TAD_EVENT_QD6_WDAT);
+
+EVENT_ATTR(qd7_idx,	L2C_TAD_EVENT_QD7_IDX);
+EVENT_ATTR(qd7_rdat,	L2C_TAD_EVENT_QD7_RDAT);
+EVENT_ATTR(qd7_bnks,	L2C_TAD_EVENT_QD7_BNKS);
+EVENT_ATTR(qd7_wdat,	L2C_TAD_EVENT_QD7_WDAT);
+
+static struct attribute *thunder_l2c_tad_events_attr[] = {
+	EVENT_PTR(l2t_hit),
+	EVENT_PTR(l2t_miss),
+	EVENT_PTR(l2t_noalloc),
+	EVENT_PTR(l2_vic),
+	EVENT_PTR(sc_fail),
+	EVENT_PTR(sc_pass),
+	EVENT_PTR(lfb_occ),
+	EVENT_PTR(wait_lfb),
+	EVENT_PTR(wait_vab),
+	EVENT_PTR(rtg_hit),
+	EVENT_PTR(rtg_miss),
+	EVENT_PTR(l2_rtg_vic),
+	EVENT_PTR(l2_open_oci),
+
+	EVENT_PTR(qd0_idx),
+	EVENT_PTR(qd0_rdat),
+	EVENT_PTR(qd0_bnks),
+	EVENT_PTR(qd0_wdat),
+
+	EVENT_PTR(qd1_idx),
+	EVENT_PTR(qd1_rdat),
+	EVENT_PTR(qd1_bnks),
+	EVENT_PTR(qd1_wdat),
+
+	EVENT_PTR(qd2_idx),
+	EVENT_PTR(qd2_rdat),
+	EVENT_PTR(qd2_bnks),
+	EVENT_PTR(qd2_wdat),
+
+	EVENT_PTR(qd3_idx),
+	EVENT_PTR(qd3_rdat),
+	EVENT_PTR(qd3_bnks),
+	EVENT_PTR(qd3_wdat),
+
+	EVENT_PTR(qd4_idx),
+	EVENT_PTR(qd4_rdat),
+	EVENT_PTR(qd4_bnks),
+	EVENT_PTR(qd4_wdat),
+
+	EVENT_PTR(qd5_idx),
+	EVENT_PTR(qd5_rdat),
+	EVENT_PTR(qd5_bnks),
+	EVENT_PTR(qd5_wdat),
+
+	EVENT_PTR(qd6_idx),
+	EVENT_PTR(qd6_rdat),
+	EVENT_PTR(qd6_bnks),
+	EVENT_PTR(qd6_wdat),
+
+	EVENT_PTR(qd7_idx),
+	EVENT_PTR(qd7_rdat),
+	EVENT_PTR(qd7_bnks),
+	EVENT_PTR(qd7_wdat),
+	NULL,
+};
+
+/* pass2 added/chanegd events */
+EVENT_ATTR(open_ccpi,		L2C_TAD_EVENT_OPEN_CCPI);
+EVENT_ATTR(lookup,		L2C_TAD_EVENT_LOOKUP);
+EVENT_ATTR(lookup_xmc_lcl,	L2C_TAD_EVENT_LOOKUP_XMC_LCL);
+EVENT_ATTR(lookup_xmc_rmt,	L2C_TAD_EVENT_LOOKUP_XMC_RMT);
+EVENT_ATTR(lookup_mib,		L2C_TAD_EVENT_LOOKUP_MIB);
+EVENT_ATTR(lookup_all,		L2C_TAD_EVENT_LOOKUP_ALL);
+
+EVENT_ATTR(tag_alc_hit,		L2C_TAD_EVENT_TAG_ALC_HIT);
+EVENT_ATTR(tag_alc_miss,	L2C_TAD_EVENT_TAG_ALC_MISS);
+EVENT_ATTR(tag_alc_nalc,	L2C_TAD_EVENT_TAG_ALC_NALC);
+EVENT_ATTR(tag_nalc_hit,	L2C_TAD_EVENT_TAG_NALC_HIT);
+EVENT_ATTR(tag_nalc_miss,	L2C_TAD_EVENT_TAG_NALC_MISS);
+
+EVENT_ATTR(lmc_wr,		L2C_TAD_EVENT_LMC_WR);
+EVENT_ATTR(lmc_sblkdty,		L2C_TAD_EVENT_LMC_SBLKDTY);
+
+EVENT_ATTR(tag_alc_rtg_hit,	L2C_TAD_EVENT_TAG_ALC_RTG_HIT);
+EVENT_ATTR(tag_alc_rtg_hite,	L2C_TAD_EVENT_TAG_ALC_RTG_HITE);
+EVENT_ATTR(tag_alc_rtg_hits,	L2C_TAD_EVENT_TAG_ALC_RTG_HITS);
+EVENT_ATTR(tag_alc_rtg_miss,	L2C_TAD_EVENT_TAG_ALC_RTG_MISS);
+EVENT_ATTR(tag_alc_nalc_rtg_hit, L2C_TAD_EVENT_TAG_NALC_RTG_HIT);
+EVENT_ATTR(tag_nalc_rtg_miss,	L2C_TAD_EVENT_TAG_NALC_RTG_MISS);
+EVENT_ATTR(tag_nalc_rtg_hite,	L2C_TAD_EVENT_TAG_NALC_RTG_HITE);
+EVENT_ATTR(tag_nalc_rtg_hits,	L2C_TAD_EVENT_TAG_NALC_RTG_HITS);
+EVENT_ATTR(tag_alc_lcl_evict,	L2C_TAD_EVENT_TAG_ALC_LCL_EVICT);
+EVENT_ATTR(tag_alc_lcl_clnvic,	L2C_TAD_EVENT_TAG_ALC_LCL_CLNVIC);
+EVENT_ATTR(tag_alc_lcl_dtyvic,	L2C_TAD_EVENT_TAG_ALC_LCL_DTYVIC);
+EVENT_ATTR(tag_alc_rmt_evict,	L2C_TAD_EVENT_TAG_ALC_RMT_EVICT);
+EVENT_ATTR(tag_alc_rmt_vic,	L2C_TAD_EVENT_TAG_ALC_RMT_VIC);
+
+EVENT_ATTR(rtg_alc,		L2C_TAD_EVENT_RTG_ALC);
+EVENT_ATTR(rtg_alc_hit,		L2C_TAD_EVENT_RTG_ALC_HIT);
+EVENT_ATTR(rtg_alc_hitwb,	L2C_TAD_EVENT_RTG_ALC_HITWB);
+
+EVENT_ATTR(stc_total,		L2C_TAD_EVENT_STC_TOTAL);
+EVENT_ATTR(stc_total_fail,	L2C_TAD_EVENT_STC_TOTAL_FAIL);
+EVENT_ATTR(stc_rmt,		L2C_TAD_EVENT_STC_RMT);
+EVENT_ATTR(stc_rmt_fail,	L2C_TAD_EVENT_STC_RMT_FAIL);
+EVENT_ATTR(stc_lcl,		L2C_TAD_EVENT_STC_LCL);
+EVENT_ATTR(stc_lcl_fail,	L2C_TAD_EVENT_STC_LCL_FAIL);
+
+EVENT_ATTR(oci_rtg_wait,	L2C_TAD_EVENT_OCI_RTG_WAIT);
+EVENT_ATTR(oci_fwd_cyc_hit,	L2C_TAD_EVENT_OCI_FWD_CYC_HIT);
+EVENT_ATTR(oci_fwd_race,	L2C_TAD_EVENT_OCI_FWD_RACE);
+EVENT_ATTR(oci_haks,		L2C_TAD_EVENT_OCI_HAKS);
+EVENT_ATTR(oci_fldx_tag_e_nodat, L2C_TAD_EVENT_OCI_FLDX_TAG_E_NODAT);
+EVENT_ATTR(oci_fldx_tag_e_dat,	L2C_TAD_EVENT_OCI_FLDX_TAG_E_DAT);
+EVENT_ATTR(oci_rldd,		L2C_TAD_EVENT_OCI_RLDD);
+EVENT_ATTR(oci_rldd_pemd,	L2C_TAD_EVENT_OCI_RLDD_PEMD);
+EVENT_ATTR(oci_rrq_dat_cnt,	L2C_TAD_EVENT_OCI_RRQ_DAT_CNT);
+EVENT_ATTR(oci_rrq_dat_dmask,	L2C_TAD_EVENT_OCI_RRQ_DAT_DMASK);
+EVENT_ATTR(oci_rsp_dat_cnt,	L2C_TAD_EVENT_OCI_RSP_DAT_CNT);
+EVENT_ATTR(oci_rsp_dat_dmaks,	L2C_TAD_EVENT_OCI_RSP_DAT_DMASK);
+EVENT_ATTR(oci_rsp_dat_vicd_cnt, L2C_TAD_EVENT_OCI_RSP_DAT_VICD_CNT);
+EVENT_ATTR(oci_rsp_dat_vicd_dmask, L2C_TAD_EVENT_OCI_RSP_DAT_VICD_DMASK);
+EVENT_ATTR(oci_rtg_alc_evict,	L2C_TAD_EVENT_OCI_RTG_ALC_EVICT);
+EVENT_ATTR(oci_rtg_alc_vic,	L2C_TAD_EVENT_OCI_RTG_ALC_VIC);
+
+static struct attribute *thunder_l2c_tad_pass2_events_attr[] = {
+	EVENT_PTR(l2t_hit),
+	EVENT_PTR(l2t_miss),
+	EVENT_PTR(l2t_noalloc),
+	EVENT_PTR(l2_vic),
+	EVENT_PTR(sc_fail),
+	EVENT_PTR(sc_pass),
+	EVENT_PTR(lfb_occ),
+	EVENT_PTR(wait_lfb),
+	EVENT_PTR(wait_vab),
+	EVENT_PTR(open_ccpi),
+
+	EVENT_PTR(lookup),
+	EVENT_PTR(lookup_xmc_lcl),
+	EVENT_PTR(lookup_xmc_rmt),
+	EVENT_PTR(lookup_mib),
+	EVENT_PTR(lookup_all),
+
+	EVENT_PTR(tag_alc_hit),
+	EVENT_PTR(tag_alc_miss),
+	EVENT_PTR(tag_alc_nalc),
+	EVENT_PTR(tag_nalc_hit),
+	EVENT_PTR(tag_nalc_miss),
+
+	EVENT_PTR(lmc_wr),
+	EVENT_PTR(lmc_sblkdty),
+
+	EVENT_PTR(tag_alc_rtg_hit),
+	EVENT_PTR(tag_alc_rtg_hite),
+	EVENT_PTR(tag_alc_rtg_hits),
+	EVENT_PTR(tag_alc_rtg_miss),
+	EVENT_PTR(tag_alc_nalc_rtg_hit),
+	EVENT_PTR(tag_nalc_rtg_miss),
+	EVENT_PTR(tag_nalc_rtg_hite),
+	EVENT_PTR(tag_nalc_rtg_hits),
+	EVENT_PTR(tag_alc_lcl_evict),
+	EVENT_PTR(tag_alc_lcl_clnvic),
+	EVENT_PTR(tag_alc_lcl_dtyvic),
+	EVENT_PTR(tag_alc_rmt_evict),
+	EVENT_PTR(tag_alc_rmt_vic),
+
+	EVENT_PTR(rtg_alc),
+	EVENT_PTR(rtg_alc_hit),
+	EVENT_PTR(rtg_alc_hitwb),
+
+	EVENT_PTR(stc_total),
+	EVENT_PTR(stc_total_fail),
+	EVENT_PTR(stc_rmt),
+	EVENT_PTR(stc_rmt_fail),
+	EVENT_PTR(stc_lcl),
+	EVENT_PTR(stc_lcl_fail),
+
+	EVENT_PTR(oci_rtg_wait),
+	EVENT_PTR(oci_fwd_cyc_hit),
+	EVENT_PTR(oci_fwd_race),
+	EVENT_PTR(oci_haks),
+	EVENT_PTR(oci_fldx_tag_e_nodat),
+	EVENT_PTR(oci_fldx_tag_e_dat),
+	EVENT_PTR(oci_rldd),
+	EVENT_PTR(oci_rldd_pemd),
+	EVENT_PTR(oci_rrq_dat_cnt),
+	EVENT_PTR(oci_rrq_dat_dmask),
+	EVENT_PTR(oci_rsp_dat_cnt),
+	EVENT_PTR(oci_rsp_dat_dmaks),
+	EVENT_PTR(oci_rsp_dat_vicd_cnt),
+	EVENT_PTR(oci_rsp_dat_vicd_dmask),
+	EVENT_PTR(oci_rtg_alc_evict),
+	EVENT_PTR(oci_rtg_alc_vic),
+
+	EVENT_PTR(qd0_idx),
+	EVENT_PTR(qd0_rdat),
+	EVENT_PTR(qd0_bnks),
+	EVENT_PTR(qd0_wdat),
+
+	EVENT_PTR(qd1_idx),
+	EVENT_PTR(qd1_rdat),
+	EVENT_PTR(qd1_bnks),
+	EVENT_PTR(qd1_wdat),
+
+	EVENT_PTR(qd2_idx),
+	EVENT_PTR(qd2_rdat),
+	EVENT_PTR(qd2_bnks),
+	EVENT_PTR(qd2_wdat),
+
+	EVENT_PTR(qd3_idx),
+	EVENT_PTR(qd3_rdat),
+	EVENT_PTR(qd3_bnks),
+	EVENT_PTR(qd3_wdat),
+
+	EVENT_PTR(qd4_idx),
+	EVENT_PTR(qd4_rdat),
+	EVENT_PTR(qd4_bnks),
+	EVENT_PTR(qd4_wdat),
+
+	EVENT_PTR(qd5_idx),
+	EVENT_PTR(qd5_rdat),
+	EVENT_PTR(qd5_bnks),
+	EVENT_PTR(qd5_wdat),
+
+	EVENT_PTR(qd6_idx),
+	EVENT_PTR(qd6_rdat),
+	EVENT_PTR(qd6_bnks),
+	EVENT_PTR(qd6_wdat),
+
+	EVENT_PTR(qd7_idx),
+	EVENT_PTR(qd7_rdat),
+	EVENT_PTR(qd7_bnks),
+	EVENT_PTR(qd7_wdat),
+	NULL,
+};
+
+static struct attribute_group thunder_l2c_tad_events_group = {
+	.name = "events",
+	.attrs = NULL,
+};
+
+static const struct attribute_group *thunder_l2c_tad_attr_groups[] = {
+	&thunder_uncore_attr_group,
+	&thunder_l2c_tad_format_group,
+	&thunder_l2c_tad_events_group,
+	NULL,
+};
+
+struct pmu thunder_l2c_tad_pmu = {
+	.attr_groups	= thunder_l2c_tad_attr_groups,
+	.name		= "thunder_l2c_tad",
+	.event_init	= thunder_uncore_event_init,
+	.add		= thunder_uncore_add,
+	.del		= thunder_uncore_del,
+	.start		= thunder_uncore_start,
+	.stop		= thunder_uncore_stop,
+	.read		= thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+	if ((config > 0 && config <= L2C_TAD_EVENT_WAIT_VAB) ||
+	    config == L2C_TAD_EVENT_RTG_HIT ||
+	    config == L2C_TAD_EVENT_RTG_MISS ||
+	    config == L2C_TAD_EVENT_L2_RTG_VIC ||
+	    config == L2C_TAD_EVENT_L2_OPEN_OCI ||
+	    ((config & 0x80) && ((config & 0xf) <= 3)))
+		return 1;
+
+	if (thunder_uncore_version == 1)
+		if (config == L2C_TAD_EVENT_OPEN_CCPI ||
+		    (config >= L2C_TAD_EVENT_LOOKUP &&
+		     config <= L2C_TAD_EVENT_LOOKUP_ALL) ||
+		    (config >= L2C_TAD_EVENT_TAG_ALC_HIT &&
+		     config <= L2C_TAD_EVENT_OCI_RTG_ALC_VIC &&
+		     config != 0x4d &&
+		     config != 0x66 &&
+		     config != 0x67))
+			return 1;
+
+	return 0;
+}
+
+int __init thunder_uncore_l2c_tad_setup(void)
+{
+	int ret = -ENOMEM;
+
+	thunder_uncore_l2c_tad = kzalloc(sizeof(struct thunder_uncore),
+					 GFP_KERNEL);
+	if (!thunder_uncore_l2c_tad)
+		goto fail_nomem;
+
+	if (thunder_uncore_version == 0)
+		thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_events_attr;
+	else /* default */
+		thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_pass2_events_attr;
+
+	ret = thunder_uncore_setup(thunder_uncore_l2c_tad,
+			   PCI_DEVICE_ID_THUNDER_L2C_TAD,
+			   L2C_TAD_CONTROL_OFFSET,
+			   L2C_TAD_COUNTER_OFFSET + L2C_TAD_NR_COUNTERS
+				* sizeof(unsigned long long),
+			   &thunder_l2c_tad_pmu,
+			   L2C_TAD_NR_COUNTERS);
+	if (ret)
+		goto fail;
+
+	thunder_uncore_l2c_tad->type = L2C_TAD_TYPE;
+	thunder_uncore_l2c_tad->event_valid = event_valid;
+	return 0;
+
+fail:
+	kfree(thunder_uncore_l2c_tad);
+fail_nomem:
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 2/5] arm64/perf: Cavium ThunderX L2C TAD uncore support
@ 2016-03-09 16:21   ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Support counters of the L2 Cache tag and data units.

Also support pass2 added/modified counters by checking MIDR.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/uncore/Makefile                |   3 +-
 drivers/perf/uncore/uncore_cavium.c         |   6 +-
 drivers/perf/uncore/uncore_cavium.h         |   7 +-
 drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
 4 files changed, 613 insertions(+), 3 deletions(-)
 create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c

diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index b9c72c2..6a16caf 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1 +1,2 @@
-obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
+obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
+			      uncore_cavium_l2c_tad.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index 4fd5e45..b92b2ae 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -15,7 +15,10 @@ int thunder_uncore_version;
 
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
 {
-	return NULL;
+	if (event->pmu->type == thunder_l2c_tad_pmu.type)
+		return thunder_uncore_l2c_tad;
+	else
+		return NULL;
 }
 
 void thunder_uncore_read(struct perf_event *event)
@@ -296,6 +299,7 @@ static int __init thunder_uncore_init(void)
 		thunder_uncore_version = 1;
 	pr_info("PMU version: %d\n", thunder_uncore_version);
 
+	thunder_uncore_l2c_tad_setup();
 	return 0;
 }
 late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index c799709..7a9c367 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -7,7 +7,7 @@
 #define pr_fmt(fmt)     "thunderx_uncore: " fmt
 
 enum uncore_type {
-	NOP_TYPE,
+	L2C_TAD_TYPE,
 };
 
 extern int thunder_uncore_version;
@@ -65,6 +65,9 @@ static inline struct thunder_uncore_node *get_node(u64 config,
 extern struct attribute_group thunder_uncore_attr_group;
 extern struct device_attribute format_attr_node;
 
+extern struct thunder_uncore *thunder_uncore_l2c_tad;
+extern struct pmu thunder_l2c_tad_pmu;
+
 /* Prototypes */
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
 void thunder_uncore_del(struct perf_event *event, int flags);
@@ -76,3 +79,5 @@ int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
 ssize_t thunder_events_sysfs_show(struct device *dev,
 				  struct device_attribute *attr,
 				  char *page);
+
+int thunder_uncore_l2c_tad_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_l2c_tad.c b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
new file mode 100644
index 0000000..c8dc305
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
@@ -0,0 +1,600 @@
+/*
+ * Cavium Thunder uncore PMU support, L2C TAD counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_L2C_TAD
+#define PCI_DEVICE_ID_THUNDER_L2C_TAD	0xa02e
+#endif
+
+#define L2C_TAD_NR_COUNTERS             4
+#define L2C_TAD_CONTROL_OFFSET		0x10000
+#define L2C_TAD_COUNTER_OFFSET		0x100
+
+/* L2C TAD event list */
+#define L2C_TAD_EVENTS_DISABLED		0x00
+
+#define L2C_TAD_EVENT_L2T_HIT		0x01
+#define L2C_TAD_EVENT_L2T_MISS		0x02
+#define L2C_TAD_EVENT_L2T_NOALLOC	0x03
+#define L2C_TAD_EVENT_L2_VIC		0x04
+#define L2C_TAD_EVENT_SC_FAIL		0x05
+#define L2C_TAD_EVENT_SC_PASS		0x06
+#define L2C_TAD_EVENT_LFB_OCC		0x07
+#define L2C_TAD_EVENT_WAIT_LFB		0x08
+#define L2C_TAD_EVENT_WAIT_VAB		0x09
+
+#define L2C_TAD_EVENT_RTG_HIT		0x41
+#define L2C_TAD_EVENT_RTG_MISS		0x42
+#define L2C_TAD_EVENT_L2_RTG_VIC	0x44
+#define L2C_TAD_EVENT_L2_OPEN_OCI	0x48
+
+#define L2C_TAD_EVENT_QD0_IDX		0x80
+#define L2C_TAD_EVENT_QD0_RDAT		0x81
+#define L2C_TAD_EVENT_QD0_BNKS		0x82
+#define L2C_TAD_EVENT_QD0_WDAT		0x83
+
+#define L2C_TAD_EVENT_QD1_IDX		0x90
+#define L2C_TAD_EVENT_QD1_RDAT		0x91
+#define L2C_TAD_EVENT_QD1_BNKS		0x92
+#define L2C_TAD_EVENT_QD1_WDAT		0x93
+
+#define L2C_TAD_EVENT_QD2_IDX		0xa0
+#define L2C_TAD_EVENT_QD2_RDAT		0xa1
+#define L2C_TAD_EVENT_QD2_BNKS		0xa2
+#define L2C_TAD_EVENT_QD2_WDAT		0xa3
+
+#define L2C_TAD_EVENT_QD3_IDX		0xb0
+#define L2C_TAD_EVENT_QD3_RDAT		0xb1
+#define L2C_TAD_EVENT_QD3_BNKS		0xb2
+#define L2C_TAD_EVENT_QD3_WDAT		0xb3
+
+#define L2C_TAD_EVENT_QD4_IDX		0xc0
+#define L2C_TAD_EVENT_QD4_RDAT		0xc1
+#define L2C_TAD_EVENT_QD4_BNKS		0xc2
+#define L2C_TAD_EVENT_QD4_WDAT		0xc3
+
+#define L2C_TAD_EVENT_QD5_IDX		0xd0
+#define L2C_TAD_EVENT_QD5_RDAT		0xd1
+#define L2C_TAD_EVENT_QD5_BNKS		0xd2
+#define L2C_TAD_EVENT_QD5_WDAT		0xd3
+
+#define L2C_TAD_EVENT_QD6_IDX		0xe0
+#define L2C_TAD_EVENT_QD6_RDAT		0xe1
+#define L2C_TAD_EVENT_QD6_BNKS		0xe2
+#define L2C_TAD_EVENT_QD6_WDAT		0xe3
+
+#define L2C_TAD_EVENT_QD7_IDX		0xf0
+#define L2C_TAD_EVENT_QD7_RDAT		0xf1
+#define L2C_TAD_EVENT_QD7_BNKS		0xf2
+#define L2C_TAD_EVENT_QD7_WDAT		0xf3
+
+/* pass2 added/changed event list */
+#define L2C_TAD_EVENT_OPEN_CCPI			0x0a
+#define L2C_TAD_EVENT_LOOKUP			0x40
+#define L2C_TAD_EVENT_LOOKUP_XMC_LCL		0x41
+#define L2C_TAD_EVENT_LOOKUP_XMC_RMT		0x42
+#define L2C_TAD_EVENT_LOOKUP_MIB		0x43
+#define L2C_TAD_EVENT_LOOKUP_ALL		0x44
+#define L2C_TAD_EVENT_TAG_ALC_HIT		0x48
+#define L2C_TAD_EVENT_TAG_ALC_MISS		0x49
+#define L2C_TAD_EVENT_TAG_ALC_NALC		0x4a
+#define L2C_TAD_EVENT_TAG_NALC_HIT		0x4b
+#define L2C_TAD_EVENT_TAG_NALC_MISS		0x4c
+#define L2C_TAD_EVENT_LMC_WR			0x4e
+#define L2C_TAD_EVENT_LMC_SBLKDTY		0x4f
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HIT		0x50
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HITE		0x51
+#define L2C_TAD_EVENT_TAG_ALC_RTG_HITS		0x52
+#define L2C_TAD_EVENT_TAG_ALC_RTG_MISS		0x53
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HIT		0x54
+#define L2C_TAD_EVENT_TAG_NALC_RTG_MISS		0x55
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HITE		0x56
+#define L2C_TAD_EVENT_TAG_NALC_RTG_HITS		0x57
+#define L2C_TAD_EVENT_TAG_ALC_LCL_EVICT		0x58
+#define L2C_TAD_EVENT_TAG_ALC_LCL_CLNVIC	0x59
+#define L2C_TAD_EVENT_TAG_ALC_LCL_DTYVIC	0x5a
+#define L2C_TAD_EVENT_TAG_ALC_RMT_EVICT		0x5b
+#define L2C_TAD_EVENT_TAG_ALC_RMT_VIC		0x5c
+#define L2C_TAD_EVENT_RTG_ALC			0x5d
+#define L2C_TAD_EVENT_RTG_ALC_HIT		0x5e
+#define L2C_TAD_EVENT_RTG_ALC_HITWB		0x5f
+#define L2C_TAD_EVENT_STC_TOTAL			0x60
+#define L2C_TAD_EVENT_STC_TOTAL_FAIL		0x61
+#define L2C_TAD_EVENT_STC_RMT			0x62
+#define L2C_TAD_EVENT_STC_RMT_FAIL		0x63
+#define L2C_TAD_EVENT_STC_LCL			0x64
+#define L2C_TAD_EVENT_STC_LCL_FAIL		0x65
+#define L2C_TAD_EVENT_OCI_RTG_WAIT		0x68
+#define L2C_TAD_EVENT_OCI_FWD_CYC_HIT		0x69
+#define L2C_TAD_EVENT_OCI_FWD_RACE		0x6a
+#define L2C_TAD_EVENT_OCI_HAKS			0x6b
+#define L2C_TAD_EVENT_OCI_FLDX_TAG_E_NODAT	0x6c
+#define L2C_TAD_EVENT_OCI_FLDX_TAG_E_DAT	0x6d
+#define L2C_TAD_EVENT_OCI_RLDD			0x6e
+#define L2C_TAD_EVENT_OCI_RLDD_PEMD		0x6f
+#define L2C_TAD_EVENT_OCI_RRQ_DAT_CNT		0x70
+#define L2C_TAD_EVENT_OCI_RRQ_DAT_DMASK		0x71
+#define L2C_TAD_EVENT_OCI_RSP_DAT_CNT		0x72
+#define L2C_TAD_EVENT_OCI_RSP_DAT_DMASK		0x73
+#define L2C_TAD_EVENT_OCI_RSP_DAT_VICD_CNT	0x74
+#define L2C_TAD_EVENT_OCI_RSP_DAT_VICD_DMASK	0x75
+#define L2C_TAD_EVENT_OCI_RTG_ALC_EVICT		0x76
+#define L2C_TAD_EVENT_OCI_RTG_ALC_VIC		0x77
+
+struct thunder_uncore *thunder_uncore_l2c_tad;
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore_unit *unit;
+	u64 prev;
+	int id;
+
+	node = get_node(hwc->config, uncore);
+	id = get_id(hwc->config);
+
+	/* restore counter value divided by units into all counters */
+	if (flags & PERF_EF_RELOAD) {
+		prev = local64_read(&hwc->prev_count);
+		prev = prev / node->nr_units;
+
+		list_for_each_entry(unit, &node->unit_list, entry)
+			writeq(prev, hwc->event_base + unit->map);
+	}
+
+	hwc->state = 0;
+
+	/* write byte in control registers for all units on the node */
+	list_for_each_entry(unit, &node->unit_list, entry)
+		writeb(id, hwc->config_base + unit->map);
+
+	perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore_unit *unit;
+
+	/* reset selection value for all units on the node */
+	node = get_node(hwc->config, uncore);
+
+	list_for_each_entry(unit, &node->unit_list, entry)
+		writeb(L2C_TAD_EVENTS_DISABLED, hwc->config_base + unit->map);
+	hwc->state |= PERF_HES_STOPPED;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		thunder_uncore_read(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int i;
+
+	WARN_ON_ONCE(!uncore);
+	node = get_node(hwc->config, uncore);
+
+	/* are we already assigned? */
+	if (hwc->idx != -1 && node->events[hwc->idx] == event)
+		goto out;
+
+	for (i = 0; i < node->num_counters; i++) {
+		if (node->events[i] == event) {
+			hwc->idx = i;
+			goto out;
+		}
+	}
+
+	/* if not take the first available counter */
+	hwc->idx = -1;
+	for (i = 0; i < node->num_counters; i++) {
+		if (cmpxchg(&node->events[i], NULL, event) == NULL) {
+			hwc->idx = i;
+			break;
+		}
+	}
+out:
+	if (hwc->idx == -1)
+		return -EBUSY;
+
+	hwc->config_base = hwc->idx;
+	hwc->event_base = L2C_TAD_COUNTER_OFFSET +
+			  hwc->idx * sizeof(unsigned long long);
+	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+	if (flags & PERF_EF_START)
+		thunder_uncore_start(event, PERF_EF_RELOAD);
+	return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-7");
+
+static struct attribute *thunder_l2c_tad_format_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_node.attr,
+	NULL,
+};
+
+static struct attribute_group thunder_l2c_tad_format_group = {
+	.name = "format",
+	.attrs = thunder_l2c_tad_format_attr,
+};
+
+EVENT_ATTR(l2t_hit,	L2C_TAD_EVENT_L2T_HIT);
+EVENT_ATTR(l2t_miss,	L2C_TAD_EVENT_L2T_MISS);
+EVENT_ATTR(l2t_noalloc,	L2C_TAD_EVENT_L2T_NOALLOC);
+EVENT_ATTR(l2_vic,	L2C_TAD_EVENT_L2_VIC);
+EVENT_ATTR(sc_fail,	L2C_TAD_EVENT_SC_FAIL);
+EVENT_ATTR(sc_pass,	L2C_TAD_EVENT_SC_PASS);
+EVENT_ATTR(lfb_occ,	L2C_TAD_EVENT_LFB_OCC);
+EVENT_ATTR(wait_lfb,	L2C_TAD_EVENT_WAIT_LFB);
+EVENT_ATTR(wait_vab,	L2C_TAD_EVENT_WAIT_VAB);
+EVENT_ATTR(rtg_hit,	L2C_TAD_EVENT_RTG_HIT);
+EVENT_ATTR(rtg_miss,	L2C_TAD_EVENT_RTG_MISS);
+EVENT_ATTR(l2_rtg_vic,	L2C_TAD_EVENT_L2_RTG_VIC);
+EVENT_ATTR(l2_open_oci,	L2C_TAD_EVENT_L2_OPEN_OCI);
+
+EVENT_ATTR(qd0_idx,	L2C_TAD_EVENT_QD0_IDX);
+EVENT_ATTR(qd0_rdat,	L2C_TAD_EVENT_QD0_RDAT);
+EVENT_ATTR(qd0_bnks,	L2C_TAD_EVENT_QD0_BNKS);
+EVENT_ATTR(qd0_wdat,	L2C_TAD_EVENT_QD0_WDAT);
+
+EVENT_ATTR(qd1_idx,	L2C_TAD_EVENT_QD1_IDX);
+EVENT_ATTR(qd1_rdat,	L2C_TAD_EVENT_QD1_RDAT);
+EVENT_ATTR(qd1_bnks,	L2C_TAD_EVENT_QD1_BNKS);
+EVENT_ATTR(qd1_wdat,	L2C_TAD_EVENT_QD1_WDAT);
+
+EVENT_ATTR(qd2_idx,	L2C_TAD_EVENT_QD2_IDX);
+EVENT_ATTR(qd2_rdat,	L2C_TAD_EVENT_QD2_RDAT);
+EVENT_ATTR(qd2_bnks,	L2C_TAD_EVENT_QD2_BNKS);
+EVENT_ATTR(qd2_wdat,	L2C_TAD_EVENT_QD2_WDAT);
+
+EVENT_ATTR(qd3_idx,	L2C_TAD_EVENT_QD3_IDX);
+EVENT_ATTR(qd3_rdat,	L2C_TAD_EVENT_QD3_RDAT);
+EVENT_ATTR(qd3_bnks,	L2C_TAD_EVENT_QD3_BNKS);
+EVENT_ATTR(qd3_wdat,	L2C_TAD_EVENT_QD3_WDAT);
+
+EVENT_ATTR(qd4_idx,	L2C_TAD_EVENT_QD4_IDX);
+EVENT_ATTR(qd4_rdat,	L2C_TAD_EVENT_QD4_RDAT);
+EVENT_ATTR(qd4_bnks,	L2C_TAD_EVENT_QD4_BNKS);
+EVENT_ATTR(qd4_wdat,	L2C_TAD_EVENT_QD4_WDAT);
+
+EVENT_ATTR(qd5_idx,	L2C_TAD_EVENT_QD5_IDX);
+EVENT_ATTR(qd5_rdat,	L2C_TAD_EVENT_QD5_RDAT);
+EVENT_ATTR(qd5_bnks,	L2C_TAD_EVENT_QD5_BNKS);
+EVENT_ATTR(qd5_wdat,	L2C_TAD_EVENT_QD5_WDAT);
+
+EVENT_ATTR(qd6_idx,	L2C_TAD_EVENT_QD6_IDX);
+EVENT_ATTR(qd6_rdat,	L2C_TAD_EVENT_QD6_RDAT);
+EVENT_ATTR(qd6_bnks,	L2C_TAD_EVENT_QD6_BNKS);
+EVENT_ATTR(qd6_wdat,	L2C_TAD_EVENT_QD6_WDAT);
+
+EVENT_ATTR(qd7_idx,	L2C_TAD_EVENT_QD7_IDX);
+EVENT_ATTR(qd7_rdat,	L2C_TAD_EVENT_QD7_RDAT);
+EVENT_ATTR(qd7_bnks,	L2C_TAD_EVENT_QD7_BNKS);
+EVENT_ATTR(qd7_wdat,	L2C_TAD_EVENT_QD7_WDAT);
+
+static struct attribute *thunder_l2c_tad_events_attr[] = {
+	EVENT_PTR(l2t_hit),
+	EVENT_PTR(l2t_miss),
+	EVENT_PTR(l2t_noalloc),
+	EVENT_PTR(l2_vic),
+	EVENT_PTR(sc_fail),
+	EVENT_PTR(sc_pass),
+	EVENT_PTR(lfb_occ),
+	EVENT_PTR(wait_lfb),
+	EVENT_PTR(wait_vab),
+	EVENT_PTR(rtg_hit),
+	EVENT_PTR(rtg_miss),
+	EVENT_PTR(l2_rtg_vic),
+	EVENT_PTR(l2_open_oci),
+
+	EVENT_PTR(qd0_idx),
+	EVENT_PTR(qd0_rdat),
+	EVENT_PTR(qd0_bnks),
+	EVENT_PTR(qd0_wdat),
+
+	EVENT_PTR(qd1_idx),
+	EVENT_PTR(qd1_rdat),
+	EVENT_PTR(qd1_bnks),
+	EVENT_PTR(qd1_wdat),
+
+	EVENT_PTR(qd2_idx),
+	EVENT_PTR(qd2_rdat),
+	EVENT_PTR(qd2_bnks),
+	EVENT_PTR(qd2_wdat),
+
+	EVENT_PTR(qd3_idx),
+	EVENT_PTR(qd3_rdat),
+	EVENT_PTR(qd3_bnks),
+	EVENT_PTR(qd3_wdat),
+
+	EVENT_PTR(qd4_idx),
+	EVENT_PTR(qd4_rdat),
+	EVENT_PTR(qd4_bnks),
+	EVENT_PTR(qd4_wdat),
+
+	EVENT_PTR(qd5_idx),
+	EVENT_PTR(qd5_rdat),
+	EVENT_PTR(qd5_bnks),
+	EVENT_PTR(qd5_wdat),
+
+	EVENT_PTR(qd6_idx),
+	EVENT_PTR(qd6_rdat),
+	EVENT_PTR(qd6_bnks),
+	EVENT_PTR(qd6_wdat),
+
+	EVENT_PTR(qd7_idx),
+	EVENT_PTR(qd7_rdat),
+	EVENT_PTR(qd7_bnks),
+	EVENT_PTR(qd7_wdat),
+	NULL,
+};
+
+/* pass2 added/chanegd events */
+EVENT_ATTR(open_ccpi,		L2C_TAD_EVENT_OPEN_CCPI);
+EVENT_ATTR(lookup,		L2C_TAD_EVENT_LOOKUP);
+EVENT_ATTR(lookup_xmc_lcl,	L2C_TAD_EVENT_LOOKUP_XMC_LCL);
+EVENT_ATTR(lookup_xmc_rmt,	L2C_TAD_EVENT_LOOKUP_XMC_RMT);
+EVENT_ATTR(lookup_mib,		L2C_TAD_EVENT_LOOKUP_MIB);
+EVENT_ATTR(lookup_all,		L2C_TAD_EVENT_LOOKUP_ALL);
+
+EVENT_ATTR(tag_alc_hit,		L2C_TAD_EVENT_TAG_ALC_HIT);
+EVENT_ATTR(tag_alc_miss,	L2C_TAD_EVENT_TAG_ALC_MISS);
+EVENT_ATTR(tag_alc_nalc,	L2C_TAD_EVENT_TAG_ALC_NALC);
+EVENT_ATTR(tag_nalc_hit,	L2C_TAD_EVENT_TAG_NALC_HIT);
+EVENT_ATTR(tag_nalc_miss,	L2C_TAD_EVENT_TAG_NALC_MISS);
+
+EVENT_ATTR(lmc_wr,		L2C_TAD_EVENT_LMC_WR);
+EVENT_ATTR(lmc_sblkdty,		L2C_TAD_EVENT_LMC_SBLKDTY);
+
+EVENT_ATTR(tag_alc_rtg_hit,	L2C_TAD_EVENT_TAG_ALC_RTG_HIT);
+EVENT_ATTR(tag_alc_rtg_hite,	L2C_TAD_EVENT_TAG_ALC_RTG_HITE);
+EVENT_ATTR(tag_alc_rtg_hits,	L2C_TAD_EVENT_TAG_ALC_RTG_HITS);
+EVENT_ATTR(tag_alc_rtg_miss,	L2C_TAD_EVENT_TAG_ALC_RTG_MISS);
+EVENT_ATTR(tag_alc_nalc_rtg_hit, L2C_TAD_EVENT_TAG_NALC_RTG_HIT);
+EVENT_ATTR(tag_nalc_rtg_miss,	L2C_TAD_EVENT_TAG_NALC_RTG_MISS);
+EVENT_ATTR(tag_nalc_rtg_hite,	L2C_TAD_EVENT_TAG_NALC_RTG_HITE);
+EVENT_ATTR(tag_nalc_rtg_hits,	L2C_TAD_EVENT_TAG_NALC_RTG_HITS);
+EVENT_ATTR(tag_alc_lcl_evict,	L2C_TAD_EVENT_TAG_ALC_LCL_EVICT);
+EVENT_ATTR(tag_alc_lcl_clnvic,	L2C_TAD_EVENT_TAG_ALC_LCL_CLNVIC);
+EVENT_ATTR(tag_alc_lcl_dtyvic,	L2C_TAD_EVENT_TAG_ALC_LCL_DTYVIC);
+EVENT_ATTR(tag_alc_rmt_evict,	L2C_TAD_EVENT_TAG_ALC_RMT_EVICT);
+EVENT_ATTR(tag_alc_rmt_vic,	L2C_TAD_EVENT_TAG_ALC_RMT_VIC);
+
+EVENT_ATTR(rtg_alc,		L2C_TAD_EVENT_RTG_ALC);
+EVENT_ATTR(rtg_alc_hit,		L2C_TAD_EVENT_RTG_ALC_HIT);
+EVENT_ATTR(rtg_alc_hitwb,	L2C_TAD_EVENT_RTG_ALC_HITWB);
+
+EVENT_ATTR(stc_total,		L2C_TAD_EVENT_STC_TOTAL);
+EVENT_ATTR(stc_total_fail,	L2C_TAD_EVENT_STC_TOTAL_FAIL);
+EVENT_ATTR(stc_rmt,		L2C_TAD_EVENT_STC_RMT);
+EVENT_ATTR(stc_rmt_fail,	L2C_TAD_EVENT_STC_RMT_FAIL);
+EVENT_ATTR(stc_lcl,		L2C_TAD_EVENT_STC_LCL);
+EVENT_ATTR(stc_lcl_fail,	L2C_TAD_EVENT_STC_LCL_FAIL);
+
+EVENT_ATTR(oci_rtg_wait,	L2C_TAD_EVENT_OCI_RTG_WAIT);
+EVENT_ATTR(oci_fwd_cyc_hit,	L2C_TAD_EVENT_OCI_FWD_CYC_HIT);
+EVENT_ATTR(oci_fwd_race,	L2C_TAD_EVENT_OCI_FWD_RACE);
+EVENT_ATTR(oci_haks,		L2C_TAD_EVENT_OCI_HAKS);
+EVENT_ATTR(oci_fldx_tag_e_nodat, L2C_TAD_EVENT_OCI_FLDX_TAG_E_NODAT);
+EVENT_ATTR(oci_fldx_tag_e_dat,	L2C_TAD_EVENT_OCI_FLDX_TAG_E_DAT);
+EVENT_ATTR(oci_rldd,		L2C_TAD_EVENT_OCI_RLDD);
+EVENT_ATTR(oci_rldd_pemd,	L2C_TAD_EVENT_OCI_RLDD_PEMD);
+EVENT_ATTR(oci_rrq_dat_cnt,	L2C_TAD_EVENT_OCI_RRQ_DAT_CNT);
+EVENT_ATTR(oci_rrq_dat_dmask,	L2C_TAD_EVENT_OCI_RRQ_DAT_DMASK);
+EVENT_ATTR(oci_rsp_dat_cnt,	L2C_TAD_EVENT_OCI_RSP_DAT_CNT);
+EVENT_ATTR(oci_rsp_dat_dmaks,	L2C_TAD_EVENT_OCI_RSP_DAT_DMASK);
+EVENT_ATTR(oci_rsp_dat_vicd_cnt, L2C_TAD_EVENT_OCI_RSP_DAT_VICD_CNT);
+EVENT_ATTR(oci_rsp_dat_vicd_dmask, L2C_TAD_EVENT_OCI_RSP_DAT_VICD_DMASK);
+EVENT_ATTR(oci_rtg_alc_evict,	L2C_TAD_EVENT_OCI_RTG_ALC_EVICT);
+EVENT_ATTR(oci_rtg_alc_vic,	L2C_TAD_EVENT_OCI_RTG_ALC_VIC);
+
+static struct attribute *thunder_l2c_tad_pass2_events_attr[] = {
+	EVENT_PTR(l2t_hit),
+	EVENT_PTR(l2t_miss),
+	EVENT_PTR(l2t_noalloc),
+	EVENT_PTR(l2_vic),
+	EVENT_PTR(sc_fail),
+	EVENT_PTR(sc_pass),
+	EVENT_PTR(lfb_occ),
+	EVENT_PTR(wait_lfb),
+	EVENT_PTR(wait_vab),
+	EVENT_PTR(open_ccpi),
+
+	EVENT_PTR(lookup),
+	EVENT_PTR(lookup_xmc_lcl),
+	EVENT_PTR(lookup_xmc_rmt),
+	EVENT_PTR(lookup_mib),
+	EVENT_PTR(lookup_all),
+
+	EVENT_PTR(tag_alc_hit),
+	EVENT_PTR(tag_alc_miss),
+	EVENT_PTR(tag_alc_nalc),
+	EVENT_PTR(tag_nalc_hit),
+	EVENT_PTR(tag_nalc_miss),
+
+	EVENT_PTR(lmc_wr),
+	EVENT_PTR(lmc_sblkdty),
+
+	EVENT_PTR(tag_alc_rtg_hit),
+	EVENT_PTR(tag_alc_rtg_hite),
+	EVENT_PTR(tag_alc_rtg_hits),
+	EVENT_PTR(tag_alc_rtg_miss),
+	EVENT_PTR(tag_alc_nalc_rtg_hit),
+	EVENT_PTR(tag_nalc_rtg_miss),
+	EVENT_PTR(tag_nalc_rtg_hite),
+	EVENT_PTR(tag_nalc_rtg_hits),
+	EVENT_PTR(tag_alc_lcl_evict),
+	EVENT_PTR(tag_alc_lcl_clnvic),
+	EVENT_PTR(tag_alc_lcl_dtyvic),
+	EVENT_PTR(tag_alc_rmt_evict),
+	EVENT_PTR(tag_alc_rmt_vic),
+
+	EVENT_PTR(rtg_alc),
+	EVENT_PTR(rtg_alc_hit),
+	EVENT_PTR(rtg_alc_hitwb),
+
+	EVENT_PTR(stc_total),
+	EVENT_PTR(stc_total_fail),
+	EVENT_PTR(stc_rmt),
+	EVENT_PTR(stc_rmt_fail),
+	EVENT_PTR(stc_lcl),
+	EVENT_PTR(stc_lcl_fail),
+
+	EVENT_PTR(oci_rtg_wait),
+	EVENT_PTR(oci_fwd_cyc_hit),
+	EVENT_PTR(oci_fwd_race),
+	EVENT_PTR(oci_haks),
+	EVENT_PTR(oci_fldx_tag_e_nodat),
+	EVENT_PTR(oci_fldx_tag_e_dat),
+	EVENT_PTR(oci_rldd),
+	EVENT_PTR(oci_rldd_pemd),
+	EVENT_PTR(oci_rrq_dat_cnt),
+	EVENT_PTR(oci_rrq_dat_dmask),
+	EVENT_PTR(oci_rsp_dat_cnt),
+	EVENT_PTR(oci_rsp_dat_dmaks),
+	EVENT_PTR(oci_rsp_dat_vicd_cnt),
+	EVENT_PTR(oci_rsp_dat_vicd_dmask),
+	EVENT_PTR(oci_rtg_alc_evict),
+	EVENT_PTR(oci_rtg_alc_vic),
+
+	EVENT_PTR(qd0_idx),
+	EVENT_PTR(qd0_rdat),
+	EVENT_PTR(qd0_bnks),
+	EVENT_PTR(qd0_wdat),
+
+	EVENT_PTR(qd1_idx),
+	EVENT_PTR(qd1_rdat),
+	EVENT_PTR(qd1_bnks),
+	EVENT_PTR(qd1_wdat),
+
+	EVENT_PTR(qd2_idx),
+	EVENT_PTR(qd2_rdat),
+	EVENT_PTR(qd2_bnks),
+	EVENT_PTR(qd2_wdat),
+
+	EVENT_PTR(qd3_idx),
+	EVENT_PTR(qd3_rdat),
+	EVENT_PTR(qd3_bnks),
+	EVENT_PTR(qd3_wdat),
+
+	EVENT_PTR(qd4_idx),
+	EVENT_PTR(qd4_rdat),
+	EVENT_PTR(qd4_bnks),
+	EVENT_PTR(qd4_wdat),
+
+	EVENT_PTR(qd5_idx),
+	EVENT_PTR(qd5_rdat),
+	EVENT_PTR(qd5_bnks),
+	EVENT_PTR(qd5_wdat),
+
+	EVENT_PTR(qd6_idx),
+	EVENT_PTR(qd6_rdat),
+	EVENT_PTR(qd6_bnks),
+	EVENT_PTR(qd6_wdat),
+
+	EVENT_PTR(qd7_idx),
+	EVENT_PTR(qd7_rdat),
+	EVENT_PTR(qd7_bnks),
+	EVENT_PTR(qd7_wdat),
+	NULL,
+};
+
+static struct attribute_group thunder_l2c_tad_events_group = {
+	.name = "events",
+	.attrs = NULL,
+};
+
+static const struct attribute_group *thunder_l2c_tad_attr_groups[] = {
+	&thunder_uncore_attr_group,
+	&thunder_l2c_tad_format_group,
+	&thunder_l2c_tad_events_group,
+	NULL,
+};
+
+struct pmu thunder_l2c_tad_pmu = {
+	.attr_groups	= thunder_l2c_tad_attr_groups,
+	.name		= "thunder_l2c_tad",
+	.event_init	= thunder_uncore_event_init,
+	.add		= thunder_uncore_add,
+	.del		= thunder_uncore_del,
+	.start		= thunder_uncore_start,
+	.stop		= thunder_uncore_stop,
+	.read		= thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+	if ((config > 0 && config <= L2C_TAD_EVENT_WAIT_VAB) ||
+	    config == L2C_TAD_EVENT_RTG_HIT ||
+	    config == L2C_TAD_EVENT_RTG_MISS ||
+	    config == L2C_TAD_EVENT_L2_RTG_VIC ||
+	    config == L2C_TAD_EVENT_L2_OPEN_OCI ||
+	    ((config & 0x80) && ((config & 0xf) <= 3)))
+		return 1;
+
+	if (thunder_uncore_version == 1)
+		if (config == L2C_TAD_EVENT_OPEN_CCPI ||
+		    (config >= L2C_TAD_EVENT_LOOKUP &&
+		     config <= L2C_TAD_EVENT_LOOKUP_ALL) ||
+		    (config >= L2C_TAD_EVENT_TAG_ALC_HIT &&
+		     config <= L2C_TAD_EVENT_OCI_RTG_ALC_VIC &&
+		     config != 0x4d &&
+		     config != 0x66 &&
+		     config != 0x67))
+			return 1;
+
+	return 0;
+}
+
+int __init thunder_uncore_l2c_tad_setup(void)
+{
+	int ret = -ENOMEM;
+
+	thunder_uncore_l2c_tad = kzalloc(sizeof(struct thunder_uncore),
+					 GFP_KERNEL);
+	if (!thunder_uncore_l2c_tad)
+		goto fail_nomem;
+
+	if (thunder_uncore_version == 0)
+		thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_events_attr;
+	else /* default */
+		thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_pass2_events_attr;
+
+	ret = thunder_uncore_setup(thunder_uncore_l2c_tad,
+			   PCI_DEVICE_ID_THUNDER_L2C_TAD,
+			   L2C_TAD_CONTROL_OFFSET,
+			   L2C_TAD_COUNTER_OFFSET + L2C_TAD_NR_COUNTERS
+				* sizeof(unsigned long long),
+			   &thunder_l2c_tad_pmu,
+			   L2C_TAD_NR_COUNTERS);
+	if (ret)
+		goto fail;
+
+	thunder_uncore_l2c_tad->type = L2C_TAD_TYPE;
+	thunder_uncore_l2c_tad->event_valid = event_valid;
+	return 0;
+
+fail:
+	kfree(thunder_uncore_l2c_tad);
+fail_nomem:
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/5] arm64/perf: Cavium ThunderX L2C CBC uncore support
  2016-03-09 16:21 ` Jan Glauber
@ 2016-03-09 16:21   ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

Support counters of the L2 cache crossbar connect.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/uncore/Makefile                |   3 +-
 drivers/perf/uncore/uncore_cavium.c         |   3 +
 drivers/perf/uncore/uncore_cavium.h         |   4 +
 drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 ++++++++++++++++++++++++++++
 4 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c

diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index 6a16caf..d52ecc9 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
-			      uncore_cavium_l2c_tad.o
+			      uncore_cavium_l2c_tad.o	\
+			      uncore_cavium_l2c_cbc.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index b92b2ae..a230450 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -17,6 +17,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
 {
 	if (event->pmu->type == thunder_l2c_tad_pmu.type)
 		return thunder_uncore_l2c_tad;
+	else if (event->pmu->type == thunder_l2c_cbc_pmu.type)
+		return thunder_uncore_l2c_cbc;
 	else
 		return NULL;
 }
@@ -300,6 +302,7 @@ static int __init thunder_uncore_init(void)
 	pr_info("PMU version: %d\n", thunder_uncore_version);
 
 	thunder_uncore_l2c_tad_setup();
+	thunder_uncore_l2c_cbc_setup();
 	return 0;
 }
 late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index 7a9c367..94bd02c 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -8,6 +8,7 @@
 
 enum uncore_type {
 	L2C_TAD_TYPE,
+	L2C_CBC_TYPE,
 };
 
 extern int thunder_uncore_version;
@@ -66,7 +67,9 @@ extern struct attribute_group thunder_uncore_attr_group;
 extern struct device_attribute format_attr_node;
 
 extern struct thunder_uncore *thunder_uncore_l2c_tad;
+extern struct thunder_uncore *thunder_uncore_l2c_cbc;
 extern struct pmu thunder_l2c_tad_pmu;
+extern struct pmu thunder_l2c_cbc_pmu;
 
 /* Prototypes */
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -81,3 +84,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
 				  char *page);
 
 int thunder_uncore_l2c_tad_setup(void);
+int thunder_uncore_l2c_cbc_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_l2c_cbc.c b/drivers/perf/uncore/uncore_cavium_l2c_cbc.c
new file mode 100644
index 0000000..bde7a51
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_l2c_cbc.c
@@ -0,0 +1,237 @@
+/*
+ * Cavium Thunder uncore PMU support, L2C CBC counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_L2C_CBC
+#define PCI_DEVICE_ID_THUNDER_L2C_CBC	0xa02f
+#endif
+
+#define L2C_CBC_NR_COUNTERS             16
+
+/* L2C CBC event list */
+#define L2C_CBC_EVENT_XMC0		0x00
+#define L2C_CBC_EVENT_XMD0		0x01
+#define L2C_CBC_EVENT_RSC0		0x02
+#define L2C_CBC_EVENT_RSD0		0x03
+#define L2C_CBC_EVENT_INV0		0x04
+#define L2C_CBC_EVENT_IOC0		0x05
+#define L2C_CBC_EVENT_IOR0		0x06
+
+#define L2C_CBC_EVENT_XMC1		0x08	/* 0x40 */
+#define L2C_CBC_EVENT_XMD1		0x09
+#define L2C_CBC_EVENT_RSC1		0x0a
+#define L2C_CBC_EVENT_RSD1		0x0b
+#define L2C_CBC_EVENT_INV1		0x0c
+
+#define L2C_CBC_EVENT_XMC2		0x10	/* 0x80 */
+#define L2C_CBC_EVENT_XMD2		0x11
+#define L2C_CBC_EVENT_RSC2		0x12
+#define L2C_CBC_EVENT_RSD2		0x13
+
+struct thunder_uncore *thunder_uncore_l2c_cbc;
+
+int l2c_cbc_events[L2C_CBC_NR_COUNTERS] = {
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c,
+	0x10, 0x11, 0x12, 0x13
+};
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore_unit *unit;
+	u64 prev;
+
+	node = get_node(hwc->config, uncore);
+
+	/* restore counter value divided by units into all counters */
+	if (flags & PERF_EF_RELOAD) {
+		prev = local64_read(&hwc->prev_count);
+		prev = prev / node->nr_units;
+
+		list_for_each_entry(unit, &node->unit_list, entry)
+			writeq(prev, hwc->event_base + unit->map);
+	}
+
+	hwc->state = 0;
+	perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		thunder_uncore_read(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int id, i;
+
+	WARN_ON_ONCE(!uncore);
+	node = get_node(hwc->config, uncore);
+	id = get_id(hwc->config);
+
+	/* are we already assigned? */
+	if (hwc->idx != -1 && node->events[hwc->idx] == event)
+		goto out;
+
+	for (i = 0; i < node->num_counters; i++) {
+		if (node->events[i] == event) {
+			hwc->idx = i;
+			goto out;
+		}
+	}
+
+	/* these counters are self-sustained so idx must match the counter! */
+	hwc->idx = -1;
+	for (i = 0; i < node->num_counters; i++) {
+		if (l2c_cbc_events[i] == id) {
+			if (cmpxchg(&node->events[i], NULL, event) == NULL) {
+				hwc->idx = i;
+				break;
+			}
+		}
+	}
+
+out:
+	if (hwc->idx == -1)
+		return -EBUSY;
+
+	hwc->event_base = id * sizeof(unsigned long long);
+
+	/* counter is not stoppable so avoiding PERF_HES_STOPPED */
+	hwc->state = PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		thunder_uncore_start(event, 0);
+
+	return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *thunder_l2c_cbc_format_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_node.attr,
+	NULL,
+};
+
+static struct attribute_group thunder_l2c_cbc_format_group = {
+	.name = "format",
+	.attrs = thunder_l2c_cbc_format_attr,
+};
+
+EVENT_ATTR(xmc0,	L2C_CBC_EVENT_XMC0);
+EVENT_ATTR(xmd0,	L2C_CBC_EVENT_XMD0);
+EVENT_ATTR(rsc0,	L2C_CBC_EVENT_RSC0);
+EVENT_ATTR(rsd0,	L2C_CBC_EVENT_RSD0);
+EVENT_ATTR(inv0,	L2C_CBC_EVENT_INV0);
+EVENT_ATTR(ioc0,	L2C_CBC_EVENT_IOC0);
+EVENT_ATTR(ior0,	L2C_CBC_EVENT_IOR0);
+EVENT_ATTR(xmc1,	L2C_CBC_EVENT_XMC1);
+EVENT_ATTR(xmd1,	L2C_CBC_EVENT_XMD1);
+EVENT_ATTR(rsc1,	L2C_CBC_EVENT_RSC1);
+EVENT_ATTR(rsd1,	L2C_CBC_EVENT_RSD1);
+EVENT_ATTR(inv1,	L2C_CBC_EVENT_INV1);
+EVENT_ATTR(xmc2,	L2C_CBC_EVENT_XMC2);
+EVENT_ATTR(xmd2,	L2C_CBC_EVENT_XMD2);
+EVENT_ATTR(rsc2,	L2C_CBC_EVENT_RSC2);
+EVENT_ATTR(rsd2,	L2C_CBC_EVENT_RSD2);
+
+static struct attribute *thunder_l2c_cbc_events_attr[] = {
+	EVENT_PTR(xmc0),
+	EVENT_PTR(xmd0),
+	EVENT_PTR(rsc0),
+	EVENT_PTR(rsd0),
+	EVENT_PTR(inv0),
+	EVENT_PTR(ioc0),
+	EVENT_PTR(ior0),
+	EVENT_PTR(xmc1),
+	EVENT_PTR(xmd1),
+	EVENT_PTR(rsc1),
+	EVENT_PTR(rsd1),
+	EVENT_PTR(inv1),
+	EVENT_PTR(xmc2),
+	EVENT_PTR(xmd2),
+	EVENT_PTR(rsc2),
+	EVENT_PTR(rsd2),
+	NULL,
+};
+
+static struct attribute_group thunder_l2c_cbc_events_group = {
+	.name = "events",
+	.attrs = thunder_l2c_cbc_events_attr,
+};
+
+static const struct attribute_group *thunder_l2c_cbc_attr_groups[] = {
+	&thunder_uncore_attr_group,
+	&thunder_l2c_cbc_format_group,
+	&thunder_l2c_cbc_events_group,
+	NULL,
+};
+
+struct pmu thunder_l2c_cbc_pmu = {
+	.attr_groups	= thunder_l2c_cbc_attr_groups,
+	.name		= "thunder_l2c_cbc",
+	.event_init	= thunder_uncore_event_init,
+	.add		= thunder_uncore_add,
+	.del		= thunder_uncore_del,
+	.start		= thunder_uncore_start,
+	.stop		= thunder_uncore_stop,
+	.read		= thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+	if (config <= L2C_CBC_EVENT_IOR0 ||
+	    (config >= L2C_CBC_EVENT_XMC1 && config <= L2C_CBC_EVENT_INV1) ||
+	    (config >= L2C_CBC_EVENT_XMC2 && config <= L2C_CBC_EVENT_RSD2))
+		return 1;
+	else
+		return 0;
+}
+
+int __init thunder_uncore_l2c_cbc_setup(void)
+{
+	int ret = -ENOMEM;
+
+	thunder_uncore_l2c_cbc = kzalloc(sizeof(struct thunder_uncore),
+					 GFP_KERNEL);
+	if (!thunder_uncore_l2c_cbc)
+		goto fail_nomem;
+
+	ret = thunder_uncore_setup(thunder_uncore_l2c_cbc,
+				   PCI_DEVICE_ID_THUNDER_L2C_CBC,
+				   0,
+				   0x100,
+				   &thunder_l2c_cbc_pmu,
+				   L2C_CBC_NR_COUNTERS);
+	if (ret)
+		goto fail;
+
+	thunder_uncore_l2c_cbc->type = L2C_CBC_TYPE;
+	thunder_uncore_l2c_cbc->event_valid = event_valid;
+	return 0;
+
+fail:
+	kfree(thunder_uncore_l2c_cbc);
+fail_nomem:
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 3/5] arm64/perf: Cavium ThunderX L2C CBC uncore support
@ 2016-03-09 16:21   ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Support counters of the L2 cache crossbar connect.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/uncore/Makefile                |   3 +-
 drivers/perf/uncore/uncore_cavium.c         |   3 +
 drivers/perf/uncore/uncore_cavium.h         |   4 +
 drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 ++++++++++++++++++++++++++++
 4 files changed, 246 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c

diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index 6a16caf..d52ecc9 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,2 +1,3 @@
 obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
-			      uncore_cavium_l2c_tad.o
+			      uncore_cavium_l2c_tad.o	\
+			      uncore_cavium_l2c_cbc.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index b92b2ae..a230450 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -17,6 +17,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
 {
 	if (event->pmu->type == thunder_l2c_tad_pmu.type)
 		return thunder_uncore_l2c_tad;
+	else if (event->pmu->type == thunder_l2c_cbc_pmu.type)
+		return thunder_uncore_l2c_cbc;
 	else
 		return NULL;
 }
@@ -300,6 +302,7 @@ static int __init thunder_uncore_init(void)
 	pr_info("PMU version: %d\n", thunder_uncore_version);
 
 	thunder_uncore_l2c_tad_setup();
+	thunder_uncore_l2c_cbc_setup();
 	return 0;
 }
 late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index 7a9c367..94bd02c 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -8,6 +8,7 @@
 
 enum uncore_type {
 	L2C_TAD_TYPE,
+	L2C_CBC_TYPE,
 };
 
 extern int thunder_uncore_version;
@@ -66,7 +67,9 @@ extern struct attribute_group thunder_uncore_attr_group;
 extern struct device_attribute format_attr_node;
 
 extern struct thunder_uncore *thunder_uncore_l2c_tad;
+extern struct thunder_uncore *thunder_uncore_l2c_cbc;
 extern struct pmu thunder_l2c_tad_pmu;
+extern struct pmu thunder_l2c_cbc_pmu;
 
 /* Prototypes */
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -81,3 +84,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
 				  char *page);
 
 int thunder_uncore_l2c_tad_setup(void);
+int thunder_uncore_l2c_cbc_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_l2c_cbc.c b/drivers/perf/uncore/uncore_cavium_l2c_cbc.c
new file mode 100644
index 0000000..bde7a51
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_l2c_cbc.c
@@ -0,0 +1,237 @@
+/*
+ * Cavium Thunder uncore PMU support, L2C CBC counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_L2C_CBC
+#define PCI_DEVICE_ID_THUNDER_L2C_CBC	0xa02f
+#endif
+
+#define L2C_CBC_NR_COUNTERS             16
+
+/* L2C CBC event list */
+#define L2C_CBC_EVENT_XMC0		0x00
+#define L2C_CBC_EVENT_XMD0		0x01
+#define L2C_CBC_EVENT_RSC0		0x02
+#define L2C_CBC_EVENT_RSD0		0x03
+#define L2C_CBC_EVENT_INV0		0x04
+#define L2C_CBC_EVENT_IOC0		0x05
+#define L2C_CBC_EVENT_IOR0		0x06
+
+#define L2C_CBC_EVENT_XMC1		0x08	/* 0x40 */
+#define L2C_CBC_EVENT_XMD1		0x09
+#define L2C_CBC_EVENT_RSC1		0x0a
+#define L2C_CBC_EVENT_RSD1		0x0b
+#define L2C_CBC_EVENT_INV1		0x0c
+
+#define L2C_CBC_EVENT_XMC2		0x10	/* 0x80 */
+#define L2C_CBC_EVENT_XMD2		0x11
+#define L2C_CBC_EVENT_RSC2		0x12
+#define L2C_CBC_EVENT_RSD2		0x13
+
+struct thunder_uncore *thunder_uncore_l2c_cbc;
+
+int l2c_cbc_events[L2C_CBC_NR_COUNTERS] = {
+	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06,
+	0x08, 0x09, 0x0a, 0x0b, 0x0c,
+	0x10, 0x11, 0x12, 0x13
+};
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	struct thunder_uncore_unit *unit;
+	u64 prev;
+
+	node = get_node(hwc->config, uncore);
+
+	/* restore counter value divided by units into all counters */
+	if (flags & PERF_EF_RELOAD) {
+		prev = local64_read(&hwc->prev_count);
+		prev = prev / node->nr_units;
+
+		list_for_each_entry(unit, &node->unit_list, entry)
+			writeq(prev, hwc->event_base + unit->map);
+	}
+
+	hwc->state = 0;
+	perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		thunder_uncore_read(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int id, i;
+
+	WARN_ON_ONCE(!uncore);
+	node = get_node(hwc->config, uncore);
+	id = get_id(hwc->config);
+
+	/* are we already assigned? */
+	if (hwc->idx != -1 && node->events[hwc->idx] == event)
+		goto out;
+
+	for (i = 0; i < node->num_counters; i++) {
+		if (node->events[i] == event) {
+			hwc->idx = i;
+			goto out;
+		}
+	}
+
+	/* these counters are self-sustained so idx must match the counter! */
+	hwc->idx = -1;
+	for (i = 0; i < node->num_counters; i++) {
+		if (l2c_cbc_events[i] == id) {
+			if (cmpxchg(&node->events[i], NULL, event) == NULL) {
+				hwc->idx = i;
+				break;
+			}
+		}
+	}
+
+out:
+	if (hwc->idx == -1)
+		return -EBUSY;
+
+	hwc->event_base = id * sizeof(unsigned long long);
+
+	/* counter is not stoppable so avoiding PERF_HES_STOPPED */
+	hwc->state = PERF_HES_UPTODATE;
+
+	if (flags & PERF_EF_START)
+		thunder_uncore_start(event, 0);
+
+	return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-4");
+
+static struct attribute *thunder_l2c_cbc_format_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_node.attr,
+	NULL,
+};
+
+static struct attribute_group thunder_l2c_cbc_format_group = {
+	.name = "format",
+	.attrs = thunder_l2c_cbc_format_attr,
+};
+
+EVENT_ATTR(xmc0,	L2C_CBC_EVENT_XMC0);
+EVENT_ATTR(xmd0,	L2C_CBC_EVENT_XMD0);
+EVENT_ATTR(rsc0,	L2C_CBC_EVENT_RSC0);
+EVENT_ATTR(rsd0,	L2C_CBC_EVENT_RSD0);
+EVENT_ATTR(inv0,	L2C_CBC_EVENT_INV0);
+EVENT_ATTR(ioc0,	L2C_CBC_EVENT_IOC0);
+EVENT_ATTR(ior0,	L2C_CBC_EVENT_IOR0);
+EVENT_ATTR(xmc1,	L2C_CBC_EVENT_XMC1);
+EVENT_ATTR(xmd1,	L2C_CBC_EVENT_XMD1);
+EVENT_ATTR(rsc1,	L2C_CBC_EVENT_RSC1);
+EVENT_ATTR(rsd1,	L2C_CBC_EVENT_RSD1);
+EVENT_ATTR(inv1,	L2C_CBC_EVENT_INV1);
+EVENT_ATTR(xmc2,	L2C_CBC_EVENT_XMC2);
+EVENT_ATTR(xmd2,	L2C_CBC_EVENT_XMD2);
+EVENT_ATTR(rsc2,	L2C_CBC_EVENT_RSC2);
+EVENT_ATTR(rsd2,	L2C_CBC_EVENT_RSD2);
+
+static struct attribute *thunder_l2c_cbc_events_attr[] = {
+	EVENT_PTR(xmc0),
+	EVENT_PTR(xmd0),
+	EVENT_PTR(rsc0),
+	EVENT_PTR(rsd0),
+	EVENT_PTR(inv0),
+	EVENT_PTR(ioc0),
+	EVENT_PTR(ior0),
+	EVENT_PTR(xmc1),
+	EVENT_PTR(xmd1),
+	EVENT_PTR(rsc1),
+	EVENT_PTR(rsd1),
+	EVENT_PTR(inv1),
+	EVENT_PTR(xmc2),
+	EVENT_PTR(xmd2),
+	EVENT_PTR(rsc2),
+	EVENT_PTR(rsd2),
+	NULL,
+};
+
+static struct attribute_group thunder_l2c_cbc_events_group = {
+	.name = "events",
+	.attrs = thunder_l2c_cbc_events_attr,
+};
+
+static const struct attribute_group *thunder_l2c_cbc_attr_groups[] = {
+	&thunder_uncore_attr_group,
+	&thunder_l2c_cbc_format_group,
+	&thunder_l2c_cbc_events_group,
+	NULL,
+};
+
+struct pmu thunder_l2c_cbc_pmu = {
+	.attr_groups	= thunder_l2c_cbc_attr_groups,
+	.name		= "thunder_l2c_cbc",
+	.event_init	= thunder_uncore_event_init,
+	.add		= thunder_uncore_add,
+	.del		= thunder_uncore_del,
+	.start		= thunder_uncore_start,
+	.stop		= thunder_uncore_stop,
+	.read		= thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+	if (config <= L2C_CBC_EVENT_IOR0 ||
+	    (config >= L2C_CBC_EVENT_XMC1 && config <= L2C_CBC_EVENT_INV1) ||
+	    (config >= L2C_CBC_EVENT_XMC2 && config <= L2C_CBC_EVENT_RSD2))
+		return 1;
+	else
+		return 0;
+}
+
+int __init thunder_uncore_l2c_cbc_setup(void)
+{
+	int ret = -ENOMEM;
+
+	thunder_uncore_l2c_cbc = kzalloc(sizeof(struct thunder_uncore),
+					 GFP_KERNEL);
+	if (!thunder_uncore_l2c_cbc)
+		goto fail_nomem;
+
+	ret = thunder_uncore_setup(thunder_uncore_l2c_cbc,
+				   PCI_DEVICE_ID_THUNDER_L2C_CBC,
+				   0,
+				   0x100,
+				   &thunder_l2c_cbc_pmu,
+				   L2C_CBC_NR_COUNTERS);
+	if (ret)
+		goto fail;
+
+	thunder_uncore_l2c_cbc->type = L2C_CBC_TYPE;
+	thunder_uncore_l2c_cbc->event_valid = event_valid;
+	return 0;
+
+fail:
+	kfree(thunder_uncore_l2c_cbc);
+fail_nomem:
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/5] arm64/perf: Cavium ThunderX LMC uncore support
  2016-03-09 16:21 ` Jan Glauber
@ 2016-03-09 16:21   ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

Support counters on the DRAM controllers.

Also support pass2 added counters by checking MIDR.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/uncore/Makefile            |   3 +-
 drivers/perf/uncore/uncore_cavium.c     |   3 +
 drivers/perf/uncore/uncore_cavium.h     |   4 +
 drivers/perf/uncore/uncore_cavium_lmc.c | 196 ++++++++++++++++++++++++++++++++
 4 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c

diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index d52ecc9..81479e8 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
 			      uncore_cavium_l2c_tad.o	\
-			      uncore_cavium_l2c_cbc.o
+			      uncore_cavium_l2c_cbc.o	\
+			      uncore_cavium_lmc.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index a230450..45c81d0 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -19,6 +19,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
 		return thunder_uncore_l2c_tad;
 	else if (event->pmu->type == thunder_l2c_cbc_pmu.type)
 		return thunder_uncore_l2c_cbc;
+	else if (event->pmu->type == thunder_lmc_pmu.type)
+		return thunder_uncore_lmc;
 	else
 		return NULL;
 }
@@ -303,6 +305,7 @@ static int __init thunder_uncore_init(void)
 
 	thunder_uncore_l2c_tad_setup();
 	thunder_uncore_l2c_cbc_setup();
+	thunder_uncore_lmc_setup();
 	return 0;
 }
 late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index 94bd02c..f14f6be 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -9,6 +9,7 @@
 enum uncore_type {
 	L2C_TAD_TYPE,
 	L2C_CBC_TYPE,
+	LMC_TYPE,
 };
 
 extern int thunder_uncore_version;
@@ -68,8 +69,10 @@ extern struct device_attribute format_attr_node;
 
 extern struct thunder_uncore *thunder_uncore_l2c_tad;
 extern struct thunder_uncore *thunder_uncore_l2c_cbc;
+extern struct thunder_uncore *thunder_uncore_lmc;
 extern struct pmu thunder_l2c_tad_pmu;
 extern struct pmu thunder_l2c_cbc_pmu;
+extern struct pmu thunder_lmc_pmu;
 
 /* Prototypes */
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -85,3 +88,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
 
 int thunder_uncore_l2c_tad_setup(void);
 int thunder_uncore_l2c_cbc_setup(void);
+int thunder_uncore_lmc_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_lmc.c b/drivers/perf/uncore/uncore_cavium_lmc.c
new file mode 100644
index 0000000..b8d21b4
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_lmc.c
@@ -0,0 +1,196 @@
+/*
+ * Cavium Thunder uncore PMU support, LMC counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_LMC
+#define PCI_DEVICE_ID_THUNDER_LMC	0xa022
+#endif
+
+#define LMC_NR_COUNTERS			3
+#define LMC_PASS2_NR_COUNTERS		5
+#define LMC_MAX_NR_COUNTERS		LMC_PASS2_NR_COUNTERS
+
+/* LMC event list */
+#define LMC_EVENT_IFB_CNT		0
+#define LMC_EVENT_OPS_CNT		1
+#define LMC_EVENT_DCLK_CNT		2
+
+/* pass 2 added counters */
+#define LMC_EVENT_BANK_CONFLICT1	3
+#define LMC_EVENT_BANK_CONFLICT2	4
+
+#define LMC_COUNTER_START		LMC_EVENT_IFB_CNT
+#define LMC_COUNTER_END			(LMC_EVENT_BANK_CONFLICT2 + 8)
+
+struct thunder_uncore *thunder_uncore_lmc;
+
+int lmc_events[LMC_MAX_NR_COUNTERS] = { 0x1d0, 0x1d8, 0x1e0, 0x360, 0x368 };
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	hwc->state = 0;
+	perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		thunder_uncore_read(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int id, i;
+
+	WARN_ON_ONCE(!uncore);
+	node = get_node(hwc->config, uncore);
+	id = get_id(hwc->config);
+
+	/* are we already assigned? */
+	if (hwc->idx != -1 && node->events[hwc->idx] == event)
+		goto out;
+
+	for (i = 0; i < node->num_counters; i++) {
+		if (node->events[i] == event) {
+			hwc->idx = i;
+			goto out;
+		}
+	}
+
+	/* these counters are self-sustained so idx must match the counter! */
+	hwc->idx = -1;
+	if (cmpxchg(&node->events[id], NULL, event) == NULL)
+		hwc->idx = i;
+
+out:
+	if (hwc->idx == -1)
+		return -EBUSY;
+
+	hwc->event_base = lmc_events[id];
+	hwc->state = PERF_HES_UPTODATE;
+
+	/* counters are read-only, so avoid PERF_EF_RELOAD */
+	if (flags & PERF_EF_START)
+		thunder_uncore_start(event, 0);
+
+	return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-2");
+
+static struct attribute *thunder_lmc_format_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_node.attr,
+	NULL,
+};
+
+static struct attribute_group thunder_lmc_format_group = {
+	.name = "format",
+	.attrs = thunder_lmc_format_attr,
+};
+
+EVENT_ATTR(ifb_cnt,		LMC_EVENT_IFB_CNT);
+EVENT_ATTR(ops_cnt,		LMC_EVENT_OPS_CNT);
+EVENT_ATTR(dclk_cnt,		LMC_EVENT_DCLK_CNT);
+EVENT_ATTR(bank_conflict1,	LMC_EVENT_BANK_CONFLICT1);
+EVENT_ATTR(bank_conflict2,	LMC_EVENT_BANK_CONFLICT2);
+
+static struct attribute *thunder_lmc_events_attr[] = {
+	EVENT_PTR(ifb_cnt),
+	EVENT_PTR(ops_cnt),
+	EVENT_PTR(dclk_cnt),
+	NULL,
+};
+
+static struct attribute *thunder_lmc_pass2_events_attr[] = {
+	EVENT_PTR(ifb_cnt),
+	EVENT_PTR(ops_cnt),
+	EVENT_PTR(dclk_cnt),
+	EVENT_PTR(bank_conflict1),
+	EVENT_PTR(bank_conflict2),
+	NULL,
+};
+
+static struct attribute_group thunder_lmc_events_group = {
+	.name = "events",
+	.attrs = NULL,
+};
+
+static const struct attribute_group *thunder_lmc_attr_groups[] = {
+	&thunder_uncore_attr_group,
+	&thunder_lmc_format_group,
+	&thunder_lmc_events_group,
+	NULL,
+};
+
+struct pmu thunder_lmc_pmu = {
+	.attr_groups	= thunder_lmc_attr_groups,
+	.name		= "thunder_lmc",
+	.event_init	= thunder_uncore_event_init,
+	.add		= thunder_uncore_add,
+	.del		= thunder_uncore_del,
+	.start		= thunder_uncore_start,
+	.stop		= thunder_uncore_stop,
+	.read		= thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+	if (config <= LMC_EVENT_DCLK_CNT)
+		return 1;
+
+	if (thunder_uncore_version == 1)
+		if (config == LMC_EVENT_BANK_CONFLICT1 ||
+		    config == LMC_EVENT_BANK_CONFLICT2)
+			return 1;
+	return 0;
+}
+
+int __init thunder_uncore_lmc_setup(void)
+{
+	int ret = -ENOMEM;
+
+	thunder_uncore_lmc = kzalloc(sizeof(struct thunder_uncore), GFP_KERNEL);
+	if (!thunder_uncore_lmc)
+		goto fail_nomem;
+
+	/* pass2 is default */
+	thunder_lmc_events_group.attrs = (thunder_uncore_version == 0) ?
+		thunder_lmc_events_attr : thunder_lmc_pass2_events_attr;
+
+	ret = thunder_uncore_setup(thunder_uncore_lmc,
+				   PCI_DEVICE_ID_THUNDER_LMC,
+				   LMC_COUNTER_START,
+				   LMC_COUNTER_END - LMC_COUNTER_START,
+				   &thunder_lmc_pmu,
+				   (thunder_uncore_version == 1) ?
+					LMC_PASS2_NR_COUNTERS : LMC_NR_COUNTERS);
+	if (ret)
+		goto fail;
+
+	thunder_uncore_lmc->type = LMC_TYPE;
+	thunder_uncore_lmc->event_valid = event_valid;
+	return 0;
+
+fail:
+	kfree(thunder_uncore_lmc);
+fail_nomem:
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 4/5] arm64/perf: Cavium ThunderX LMC uncore support
@ 2016-03-09 16:21   ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Support counters on the DRAM controllers.

Also support pass2 added counters by checking MIDR.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/uncore/Makefile            |   3 +-
 drivers/perf/uncore/uncore_cavium.c     |   3 +
 drivers/perf/uncore/uncore_cavium.h     |   4 +
 drivers/perf/uncore/uncore_cavium_lmc.c | 196 ++++++++++++++++++++++++++++++++
 4 files changed, 205 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c

diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index d52ecc9..81479e8 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,3 +1,4 @@
 obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
 			      uncore_cavium_l2c_tad.o	\
-			      uncore_cavium_l2c_cbc.o
+			      uncore_cavium_l2c_cbc.o	\
+			      uncore_cavium_lmc.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index a230450..45c81d0 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -19,6 +19,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
 		return thunder_uncore_l2c_tad;
 	else if (event->pmu->type == thunder_l2c_cbc_pmu.type)
 		return thunder_uncore_l2c_cbc;
+	else if (event->pmu->type == thunder_lmc_pmu.type)
+		return thunder_uncore_lmc;
 	else
 		return NULL;
 }
@@ -303,6 +305,7 @@ static int __init thunder_uncore_init(void)
 
 	thunder_uncore_l2c_tad_setup();
 	thunder_uncore_l2c_cbc_setup();
+	thunder_uncore_lmc_setup();
 	return 0;
 }
 late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index 94bd02c..f14f6be 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -9,6 +9,7 @@
 enum uncore_type {
 	L2C_TAD_TYPE,
 	L2C_CBC_TYPE,
+	LMC_TYPE,
 };
 
 extern int thunder_uncore_version;
@@ -68,8 +69,10 @@ extern struct device_attribute format_attr_node;
 
 extern struct thunder_uncore *thunder_uncore_l2c_tad;
 extern struct thunder_uncore *thunder_uncore_l2c_cbc;
+extern struct thunder_uncore *thunder_uncore_lmc;
 extern struct pmu thunder_l2c_tad_pmu;
 extern struct pmu thunder_l2c_cbc_pmu;
+extern struct pmu thunder_lmc_pmu;
 
 /* Prototypes */
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -85,3 +88,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
 
 int thunder_uncore_l2c_tad_setup(void);
 int thunder_uncore_l2c_cbc_setup(void);
+int thunder_uncore_lmc_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_lmc.c b/drivers/perf/uncore/uncore_cavium_lmc.c
new file mode 100644
index 0000000..b8d21b4
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_lmc.c
@@ -0,0 +1,196 @@
+/*
+ * Cavium Thunder uncore PMU support, LMC counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_LMC
+#define PCI_DEVICE_ID_THUNDER_LMC	0xa022
+#endif
+
+#define LMC_NR_COUNTERS			3
+#define LMC_PASS2_NR_COUNTERS		5
+#define LMC_MAX_NR_COUNTERS		LMC_PASS2_NR_COUNTERS
+
+/* LMC event list */
+#define LMC_EVENT_IFB_CNT		0
+#define LMC_EVENT_OPS_CNT		1
+#define LMC_EVENT_DCLK_CNT		2
+
+/* pass 2 added counters */
+#define LMC_EVENT_BANK_CONFLICT1	3
+#define LMC_EVENT_BANK_CONFLICT2	4
+
+#define LMC_COUNTER_START		LMC_EVENT_IFB_CNT
+#define LMC_COUNTER_END			(LMC_EVENT_BANK_CONFLICT2 + 8)
+
+struct thunder_uncore *thunder_uncore_lmc;
+
+int lmc_events[LMC_MAX_NR_COUNTERS] = { 0x1d0, 0x1d8, 0x1e0, 0x360, 0x368 };
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	hwc->state = 0;
+	perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		thunder_uncore_read(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int id, i;
+
+	WARN_ON_ONCE(!uncore);
+	node = get_node(hwc->config, uncore);
+	id = get_id(hwc->config);
+
+	/* are we already assigned? */
+	if (hwc->idx != -1 && node->events[hwc->idx] == event)
+		goto out;
+
+	for (i = 0; i < node->num_counters; i++) {
+		if (node->events[i] == event) {
+			hwc->idx = i;
+			goto out;
+		}
+	}
+
+	/* these counters are self-sustained so idx must match the counter! */
+	hwc->idx = -1;
+	if (cmpxchg(&node->events[id], NULL, event) == NULL)
+		hwc->idx = i;
+
+out:
+	if (hwc->idx == -1)
+		return -EBUSY;
+
+	hwc->event_base = lmc_events[id];
+	hwc->state = PERF_HES_UPTODATE;
+
+	/* counters are read-only, so avoid PERF_EF_RELOAD */
+	if (flags & PERF_EF_START)
+		thunder_uncore_start(event, 0);
+
+	return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-2");
+
+static struct attribute *thunder_lmc_format_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_node.attr,
+	NULL,
+};
+
+static struct attribute_group thunder_lmc_format_group = {
+	.name = "format",
+	.attrs = thunder_lmc_format_attr,
+};
+
+EVENT_ATTR(ifb_cnt,		LMC_EVENT_IFB_CNT);
+EVENT_ATTR(ops_cnt,		LMC_EVENT_OPS_CNT);
+EVENT_ATTR(dclk_cnt,		LMC_EVENT_DCLK_CNT);
+EVENT_ATTR(bank_conflict1,	LMC_EVENT_BANK_CONFLICT1);
+EVENT_ATTR(bank_conflict2,	LMC_EVENT_BANK_CONFLICT2);
+
+static struct attribute *thunder_lmc_events_attr[] = {
+	EVENT_PTR(ifb_cnt),
+	EVENT_PTR(ops_cnt),
+	EVENT_PTR(dclk_cnt),
+	NULL,
+};
+
+static struct attribute *thunder_lmc_pass2_events_attr[] = {
+	EVENT_PTR(ifb_cnt),
+	EVENT_PTR(ops_cnt),
+	EVENT_PTR(dclk_cnt),
+	EVENT_PTR(bank_conflict1),
+	EVENT_PTR(bank_conflict2),
+	NULL,
+};
+
+static struct attribute_group thunder_lmc_events_group = {
+	.name = "events",
+	.attrs = NULL,
+};
+
+static const struct attribute_group *thunder_lmc_attr_groups[] = {
+	&thunder_uncore_attr_group,
+	&thunder_lmc_format_group,
+	&thunder_lmc_events_group,
+	NULL,
+};
+
+struct pmu thunder_lmc_pmu = {
+	.attr_groups	= thunder_lmc_attr_groups,
+	.name		= "thunder_lmc",
+	.event_init	= thunder_uncore_event_init,
+	.add		= thunder_uncore_add,
+	.del		= thunder_uncore_del,
+	.start		= thunder_uncore_start,
+	.stop		= thunder_uncore_stop,
+	.read		= thunder_uncore_read,
+};
+
+static int event_valid(u64 config)
+{
+	if (config <= LMC_EVENT_DCLK_CNT)
+		return 1;
+
+	if (thunder_uncore_version == 1)
+		if (config == LMC_EVENT_BANK_CONFLICT1 ||
+		    config == LMC_EVENT_BANK_CONFLICT2)
+			return 1;
+	return 0;
+}
+
+int __init thunder_uncore_lmc_setup(void)
+{
+	int ret = -ENOMEM;
+
+	thunder_uncore_lmc = kzalloc(sizeof(struct thunder_uncore), GFP_KERNEL);
+	if (!thunder_uncore_lmc)
+		goto fail_nomem;
+
+	/* pass2 is default */
+	thunder_lmc_events_group.attrs = (thunder_uncore_version == 0) ?
+		thunder_lmc_events_attr : thunder_lmc_pass2_events_attr;
+
+	ret = thunder_uncore_setup(thunder_uncore_lmc,
+				   PCI_DEVICE_ID_THUNDER_LMC,
+				   LMC_COUNTER_START,
+				   LMC_COUNTER_END - LMC_COUNTER_START,
+				   &thunder_lmc_pmu,
+				   (thunder_uncore_version == 1) ?
+					LMC_PASS2_NR_COUNTERS : LMC_NR_COUNTERS);
+	if (ret)
+		goto fail;
+
+	thunder_uncore_lmc->type = LMC_TYPE;
+	thunder_uncore_lmc->event_valid = event_valid;
+	return 0;
+
+fail:
+	kfree(thunder_uncore_lmc);
+fail_nomem:
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 5/5] arm64/perf: Cavium ThunderX OCX TLK uncore support
  2016-03-09 16:21 ` Jan Glauber
@ 2016-03-09 16:21   ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-kernel, linux-arm-kernel, Jan Glauber

Support for the OCX transmit link counters.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/uncore/Makefile                |   3 +-
 drivers/perf/uncore/uncore_cavium.c         |   3 +
 drivers/perf/uncore/uncore_cavium.h         |   4 +
 drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++++++++++++
 4 files changed, 389 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c

diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index 81479e8..88d1f57 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
 			      uncore_cavium_l2c_tad.o	\
 			      uncore_cavium_l2c_cbc.o	\
-			      uncore_cavium_lmc.o
+			      uncore_cavium_lmc.o	\
+			      uncore_cavium_ocx_tlk.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index 45c81d0..e210457 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -21,6 +21,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
 		return thunder_uncore_l2c_cbc;
 	else if (event->pmu->type == thunder_lmc_pmu.type)
 		return thunder_uncore_lmc;
+	else if (event->pmu->type == thunder_ocx_tlk_pmu.type)
+		return thunder_uncore_ocx_tlk;
 	else
 		return NULL;
 }
@@ -306,6 +308,7 @@ static int __init thunder_uncore_init(void)
 	thunder_uncore_l2c_tad_setup();
 	thunder_uncore_l2c_cbc_setup();
 	thunder_uncore_lmc_setup();
+	thunder_uncore_ocx_tlk_setup();
 	return 0;
 }
 late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index f14f6be..78e95c7 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -10,6 +10,7 @@ enum uncore_type {
 	L2C_TAD_TYPE,
 	L2C_CBC_TYPE,
 	LMC_TYPE,
+	OCX_TLK_TYPE,
 };
 
 extern int thunder_uncore_version;
@@ -70,9 +71,11 @@ extern struct device_attribute format_attr_node;
 extern struct thunder_uncore *thunder_uncore_l2c_tad;
 extern struct thunder_uncore *thunder_uncore_l2c_cbc;
 extern struct thunder_uncore *thunder_uncore_lmc;
+extern struct thunder_uncore *thunder_uncore_ocx_tlk;
 extern struct pmu thunder_l2c_tad_pmu;
 extern struct pmu thunder_l2c_cbc_pmu;
 extern struct pmu thunder_lmc_pmu;
+extern struct pmu thunder_ocx_tlk_pmu;
 
 /* Prototypes */
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -89,3 +92,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
 int thunder_uncore_l2c_tad_setup(void);
 int thunder_uncore_l2c_cbc_setup(void);
 int thunder_uncore_lmc_setup(void);
+int thunder_uncore_ocx_tlk_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_ocx_tlk.c b/drivers/perf/uncore/uncore_cavium_ocx_tlk.c
new file mode 100644
index 0000000..02f1bc1
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_ocx_tlk.c
@@ -0,0 +1,380 @@
+/*
+ * Cavium Thunder uncore PMU support, OCX TLK counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_OCX
+#define PCI_DEVICE_ID_THUNDER_OCX		0xa013
+#endif
+
+#define OCX_TLK_NR_UNITS			3
+#define OCX_TLK_UNIT_OFFSET			0x2000
+#define OCX_TLK_CONTROL_OFFSET			0x10040
+#define OCX_TLK_COUNTER_OFFSET			0x10400
+
+#define OCX_TLK_STAT_DISABLE			0
+#define OCX_TLK_STAT_ENABLE			1
+
+/* OCX TLK event list */
+#define OCX_TLK_EVENT_STAT_IDLE_CNT		0x00
+#define OCX_TLK_EVENT_STAT_DATA_CNT		0x01
+#define OCX_TLK_EVENT_STAT_SYNC_CNT		0x02
+#define OCX_TLK_EVENT_STAT_RETRY_CNT		0x03
+#define OCX_TLK_EVENT_STAT_ERR_CNT		0x04
+
+#define OCX_TLK_EVENT_STAT_MAT0_CNT		0x08
+#define OCX_TLK_EVENT_STAT_MAT1_CNT		0x09
+#define OCX_TLK_EVENT_STAT_MAT2_CNT		0x0a
+#define OCX_TLK_EVENT_STAT_MAT3_CNT		0x0b
+
+#define OCX_TLK_EVENT_STAT_VC0_CMD		0x10
+#define OCX_TLK_EVENT_STAT_VC1_CMD		0x11
+#define OCX_TLK_EVENT_STAT_VC2_CMD		0x12
+#define OCX_TLK_EVENT_STAT_VC3_CMD		0x13
+#define OCX_TLK_EVENT_STAT_VC4_CMD		0x14
+#define OCX_TLK_EVENT_STAT_VC5_CMD		0x15
+
+#define OCX_TLK_EVENT_STAT_VC0_PKT		0x20
+#define OCX_TLK_EVENT_STAT_VC1_PKT		0x21
+#define OCX_TLK_EVENT_STAT_VC2_PKT		0x22
+#define OCX_TLK_EVENT_STAT_VC3_PKT		0x23
+#define OCX_TLK_EVENT_STAT_VC4_PKT		0x24
+#define OCX_TLK_EVENT_STAT_VC5_PKT		0x25
+#define OCX_TLK_EVENT_STAT_VC6_PKT		0x26
+#define OCX_TLK_EVENT_STAT_VC7_PKT		0x27
+#define OCX_TLK_EVENT_STAT_VC8_PKT		0x28
+#define OCX_TLK_EVENT_STAT_VC9_PKT		0x29
+#define OCX_TLK_EVENT_STAT_VC10_PKT		0x2a
+#define OCX_TLK_EVENT_STAT_VC11_PKT		0x2b
+#define OCX_TLK_EVENT_STAT_VC12_PKT		0x2c
+#define OCX_TLK_EVENT_STAT_VC13_PKT		0x2d
+
+#define OCX_TLK_EVENT_STAT_VC0_CON		0x30
+#define OCX_TLK_EVENT_STAT_VC1_CON		0x31
+#define OCX_TLK_EVENT_STAT_VC2_CON		0x32
+#define OCX_TLK_EVENT_STAT_VC3_CON		0x33
+#define OCX_TLK_EVENT_STAT_VC4_CON		0x34
+#define OCX_TLK_EVENT_STAT_VC5_CON		0x35
+#define OCX_TLK_EVENT_STAT_VC6_CON		0x36
+#define OCX_TLK_EVENT_STAT_VC7_CON		0x37
+#define OCX_TLK_EVENT_STAT_VC8_CON		0x38
+#define OCX_TLK_EVENT_STAT_VC9_CON		0x39
+#define OCX_TLK_EVENT_STAT_VC10_CON		0x3a
+#define OCX_TLK_EVENT_STAT_VC11_CON		0x3b
+#define OCX_TLK_EVENT_STAT_VC12_CON		0x3c
+#define OCX_TLK_EVENT_STAT_VC13_CON		0x3d
+
+#define OCX_TLK_MAX_COUNTER			OCX_TLK_EVENT_STAT_VC13_CON
+#define OCX_TLK_NR_COUNTERS			OCX_TLK_MAX_COUNTER
+
+struct thunder_uncore *thunder_uncore_ocx_tlk;
+
+/*
+ * The OCX devices have a single device per node, therefore picking the
+ * first device from the list is correct.
+ */
+static inline void __iomem *map_offset(struct thunder_uncore_node *node,
+				       unsigned long addr, int offset, int nr)
+{
+	struct thunder_uncore_unit *unit;
+
+	unit = list_first_entry(&node->unit_list, struct thunder_uncore_unit,
+				entry);
+	return (void __iomem *) (addr + unit->map + nr * offset);
+}
+
+static void __iomem *map_offset_ocx_tlk(struct thunder_uncore_node *node,
+					unsigned long addr, int nr)
+{
+	return (void __iomem *) map_offset(node, addr, nr,
+					   OCX_TLK_UNIT_OFFSET);
+}
+
+/*
+ * Summarize counters across all TLK's. Different from the other uncore
+ * PMUs because all TLK's are on one PCI device.
+ */
+static void thunder_uncore_read_ocx_tlk(struct perf_event *event)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	u64 prev, new = 0;
+	s64 delta;
+	int i;
+
+	/*
+	 * No counter overflow interrupts so we do not
+	 * have to worry about prev_count changing on us.
+	 */
+
+	prev = local64_read(&hwc->prev_count);
+
+	/* read counter values from all units */
+	node = get_node(hwc->config, uncore);
+	for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+		new += readq(map_offset_ocx_tlk(node, hwc->event_base, i));
+
+	local64_set(&hwc->prev_count, new);
+	delta = new - prev;
+	local64_add(delta, &event->count);
+}
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int i;
+
+	hwc->state = 0;
+
+	/* enable counters on all units */
+	node = get_node(hwc->config, uncore);
+	for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+		writeb(OCX_TLK_STAT_ENABLE,
+		       map_offset_ocx_tlk(node, hwc->config_base, i));
+
+	perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int i;
+
+	/* disable counters on all units */
+	node = get_node(hwc->config, uncore);
+	for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+		writeb(OCX_TLK_STAT_DISABLE,
+		       map_offset_ocx_tlk(node, hwc->config_base, i));
+	hwc->state |= PERF_HES_STOPPED;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		thunder_uncore_read_ocx_tlk(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int id, i;
+
+	WARN_ON_ONCE(!uncore);
+	node = get_node(hwc->config, uncore);
+	id = get_id(hwc->config);
+
+	/* are we already assigned? */
+	if (hwc->idx != -1 && node->events[hwc->idx] == event)
+		goto out;
+
+	for (i = 0; i < node->num_counters; i++) {
+		if (node->events[i] == event) {
+			hwc->idx = i;
+			goto out;
+		}
+	}
+
+	/* counters are 1:1 */
+	hwc->idx = -1;
+	if (cmpxchg(&node->events[id], NULL, event) == NULL)
+		hwc->idx = id;
+
+out:
+	if (hwc->idx == -1)
+		return -EBUSY;
+
+	hwc->config_base = 0;
+	hwc->event_base = OCX_TLK_COUNTER_OFFSET - OCX_TLK_CONTROL_OFFSET +
+			hwc->idx * sizeof(unsigned long long);
+	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+	if (flags & PERF_EF_START)
+		thunder_uncore_start(event, PERF_EF_RELOAD);
+	return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-5");
+
+static struct attribute *thunder_ocx_tlk_format_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_node.attr,
+	NULL,
+};
+
+static struct attribute_group thunder_ocx_tlk_format_group = {
+	.name = "format",
+	.attrs = thunder_ocx_tlk_format_attr,
+};
+
+EVENT_ATTR(idle_cnt,	OCX_TLK_EVENT_STAT_IDLE_CNT);
+EVENT_ATTR(data_cnt,	OCX_TLK_EVENT_STAT_DATA_CNT);
+EVENT_ATTR(sync_cnt,	OCX_TLK_EVENT_STAT_SYNC_CNT);
+EVENT_ATTR(retry_cnt,	OCX_TLK_EVENT_STAT_RETRY_CNT);
+EVENT_ATTR(err_cnt,	OCX_TLK_EVENT_STAT_ERR_CNT);
+EVENT_ATTR(mat0_cnt,	OCX_TLK_EVENT_STAT_MAT0_CNT);
+EVENT_ATTR(mat1_cnt,	OCX_TLK_EVENT_STAT_MAT1_CNT);
+EVENT_ATTR(mat2_cnt,	OCX_TLK_EVENT_STAT_MAT2_CNT);
+EVENT_ATTR(mat3_cnt,	OCX_TLK_EVENT_STAT_MAT3_CNT);
+EVENT_ATTR(vc0_cmd,	OCX_TLK_EVENT_STAT_VC0_CMD);
+EVENT_ATTR(vc1_cmd,	OCX_TLK_EVENT_STAT_VC1_CMD);
+EVENT_ATTR(vc2_cmd,	OCX_TLK_EVENT_STAT_VC2_CMD);
+EVENT_ATTR(vc3_cmd,	OCX_TLK_EVENT_STAT_VC3_CMD);
+EVENT_ATTR(vc4_cmd,	OCX_TLK_EVENT_STAT_VC4_CMD);
+EVENT_ATTR(vc5_cmd,	OCX_TLK_EVENT_STAT_VC5_CMD);
+EVENT_ATTR(vc0_pkt,	OCX_TLK_EVENT_STAT_VC0_PKT);
+EVENT_ATTR(vc1_pkt,	OCX_TLK_EVENT_STAT_VC1_PKT);
+EVENT_ATTR(vc2_pkt,	OCX_TLK_EVENT_STAT_VC2_PKT);
+EVENT_ATTR(vc3_pkt,	OCX_TLK_EVENT_STAT_VC3_PKT);
+EVENT_ATTR(vc4_pkt,	OCX_TLK_EVENT_STAT_VC4_PKT);
+EVENT_ATTR(vc5_pkt,	OCX_TLK_EVENT_STAT_VC5_PKT);
+EVENT_ATTR(vc6_pkt,	OCX_TLK_EVENT_STAT_VC6_PKT);
+EVENT_ATTR(vc7_pkt,	OCX_TLK_EVENT_STAT_VC7_PKT);
+EVENT_ATTR(vc8_pkt,	OCX_TLK_EVENT_STAT_VC8_PKT);
+EVENT_ATTR(vc9_pkt,	OCX_TLK_EVENT_STAT_VC9_PKT);
+EVENT_ATTR(vc10_pkt,	OCX_TLK_EVENT_STAT_VC10_PKT);
+EVENT_ATTR(vc11_pkt,	OCX_TLK_EVENT_STAT_VC11_PKT);
+EVENT_ATTR(vc12_pkt,	OCX_TLK_EVENT_STAT_VC12_PKT);
+EVENT_ATTR(vc13_pkt,	OCX_TLK_EVENT_STAT_VC13_PKT);
+EVENT_ATTR(vc0_con,	OCX_TLK_EVENT_STAT_VC0_CON);
+EVENT_ATTR(vc1_con,	OCX_TLK_EVENT_STAT_VC1_CON);
+EVENT_ATTR(vc2_con,	OCX_TLK_EVENT_STAT_VC2_CON);
+EVENT_ATTR(vc3_con,	OCX_TLK_EVENT_STAT_VC3_CON);
+EVENT_ATTR(vc4_con,	OCX_TLK_EVENT_STAT_VC4_CON);
+EVENT_ATTR(vc5_con,	OCX_TLK_EVENT_STAT_VC5_CON);
+EVENT_ATTR(vc6_con,	OCX_TLK_EVENT_STAT_VC6_CON);
+EVENT_ATTR(vc7_con,	OCX_TLK_EVENT_STAT_VC7_CON);
+EVENT_ATTR(vc8_con,	OCX_TLK_EVENT_STAT_VC8_CON);
+EVENT_ATTR(vc9_con,	OCX_TLK_EVENT_STAT_VC9_CON);
+EVENT_ATTR(vc10_con,	OCX_TLK_EVENT_STAT_VC10_CON);
+EVENT_ATTR(vc11_con,	OCX_TLK_EVENT_STAT_VC11_CON);
+EVENT_ATTR(vc12_con,	OCX_TLK_EVENT_STAT_VC12_CON);
+EVENT_ATTR(vc13_con,	OCX_TLK_EVENT_STAT_VC13_CON);
+
+static struct attribute *thunder_ocx_tlk_events_attr[] = {
+	EVENT_PTR(idle_cnt),
+	EVENT_PTR(data_cnt),
+	EVENT_PTR(sync_cnt),
+	EVENT_PTR(retry_cnt),
+	EVENT_PTR(err_cnt),
+	EVENT_PTR(mat0_cnt),
+	EVENT_PTR(mat1_cnt),
+	EVENT_PTR(mat2_cnt),
+	EVENT_PTR(mat3_cnt),
+	EVENT_PTR(vc0_cmd),
+	EVENT_PTR(vc1_cmd),
+	EVENT_PTR(vc2_cmd),
+	EVENT_PTR(vc3_cmd),
+	EVENT_PTR(vc4_cmd),
+	EVENT_PTR(vc5_cmd),
+	EVENT_PTR(vc0_pkt),
+	EVENT_PTR(vc1_pkt),
+	EVENT_PTR(vc2_pkt),
+	EVENT_PTR(vc3_pkt),
+	EVENT_PTR(vc4_pkt),
+	EVENT_PTR(vc5_pkt),
+	EVENT_PTR(vc6_pkt),
+	EVENT_PTR(vc7_pkt),
+	EVENT_PTR(vc8_pkt),
+	EVENT_PTR(vc9_pkt),
+	EVENT_PTR(vc10_pkt),
+	EVENT_PTR(vc11_pkt),
+	EVENT_PTR(vc12_pkt),
+	EVENT_PTR(vc13_pkt),
+	EVENT_PTR(vc0_con),
+	EVENT_PTR(vc1_con),
+	EVENT_PTR(vc2_con),
+	EVENT_PTR(vc3_con),
+	EVENT_PTR(vc4_con),
+	EVENT_PTR(vc5_con),
+	EVENT_PTR(vc6_con),
+	EVENT_PTR(vc7_con),
+	EVENT_PTR(vc8_con),
+	EVENT_PTR(vc9_con),
+	EVENT_PTR(vc10_con),
+	EVENT_PTR(vc11_con),
+	EVENT_PTR(vc12_con),
+	EVENT_PTR(vc13_con),
+	NULL,
+};
+
+static struct attribute_group thunder_ocx_tlk_events_group = {
+	.name = "events",
+	.attrs = thunder_ocx_tlk_events_attr,
+};
+
+static const struct attribute_group *thunder_ocx_tlk_attr_groups[] = {
+	&thunder_uncore_attr_group,
+	&thunder_ocx_tlk_format_group,
+	&thunder_ocx_tlk_events_group,
+	NULL,
+};
+
+struct pmu thunder_ocx_tlk_pmu = {
+	.attr_groups	= thunder_ocx_tlk_attr_groups,
+	.name		= "thunder_ocx_tlk",
+	.event_init	= thunder_uncore_event_init,
+	.add		= thunder_uncore_add,
+	.del		= thunder_uncore_del,
+	.start		= thunder_uncore_start,
+	.stop		= thunder_uncore_stop,
+	.read		= thunder_uncore_read_ocx_tlk,
+};
+
+static int event_valid(u64 config)
+{
+	if (config <= OCX_TLK_EVENT_STAT_ERR_CNT ||
+	    (config >= OCX_TLK_EVENT_STAT_MAT0_CNT &&
+	     config <= OCX_TLK_EVENT_STAT_MAT3_CNT) ||
+	    (config >= OCX_TLK_EVENT_STAT_VC0_CMD &&
+	     config <= OCX_TLK_EVENT_STAT_VC5_CMD) ||
+	    (config >= OCX_TLK_EVENT_STAT_VC0_PKT &&
+	     config <= OCX_TLK_EVENT_STAT_VC13_PKT) ||
+	    (config >= OCX_TLK_EVENT_STAT_VC0_CON &&
+	     config <= OCX_TLK_EVENT_STAT_VC13_CON))
+		return 1;
+	else
+		return 0;
+}
+
+int __init thunder_uncore_ocx_tlk_setup(void)
+{
+	int ret;
+
+	thunder_uncore_ocx_tlk = kzalloc(sizeof(struct thunder_uncore),
+					 GFP_KERNEL);
+	if (!thunder_uncore_ocx_tlk) {
+		ret = -ENOMEM;
+		goto fail_nomem;
+	}
+
+	ret = thunder_uncore_setup(thunder_uncore_ocx_tlk,
+				   PCI_DEVICE_ID_THUNDER_OCX,
+				   OCX_TLK_CONTROL_OFFSET,
+				   OCX_TLK_UNIT_OFFSET * OCX_TLK_NR_UNITS,
+				   &thunder_ocx_tlk_pmu,
+				   OCX_TLK_NR_COUNTERS);
+	if (ret)
+		goto fail;
+
+	thunder_uncore_ocx_tlk->type = OCX_TLK_TYPE;
+	thunder_uncore_ocx_tlk->event_valid = event_valid;
+	return 0;
+
+fail:
+	kfree(thunder_uncore_ocx_tlk);
+fail_nomem:
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* [PATCH v2 5/5] arm64/perf: Cavium ThunderX OCX TLK uncore support
@ 2016-03-09 16:21   ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-03-09 16:21 UTC (permalink / raw)
  To: linux-arm-kernel

Support for the OCX transmit link counters.

Signed-off-by: Jan Glauber <jglauber@cavium.com>
---
 drivers/perf/uncore/Makefile                |   3 +-
 drivers/perf/uncore/uncore_cavium.c         |   3 +
 drivers/perf/uncore/uncore_cavium.h         |   4 +
 drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++++++++++++
 4 files changed, 389 insertions(+), 1 deletion(-)
 create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c

diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
index 81479e8..88d1f57 100644
--- a/drivers/perf/uncore/Makefile
+++ b/drivers/perf/uncore/Makefile
@@ -1,4 +1,5 @@
 obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
 			      uncore_cavium_l2c_tad.o	\
 			      uncore_cavium_l2c_cbc.o	\
-			      uncore_cavium_lmc.o
+			      uncore_cavium_lmc.o	\
+			      uncore_cavium_ocx_tlk.o
diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
index 45c81d0..e210457 100644
--- a/drivers/perf/uncore/uncore_cavium.c
+++ b/drivers/perf/uncore/uncore_cavium.c
@@ -21,6 +21,8 @@ struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
 		return thunder_uncore_l2c_cbc;
 	else if (event->pmu->type == thunder_lmc_pmu.type)
 		return thunder_uncore_lmc;
+	else if (event->pmu->type == thunder_ocx_tlk_pmu.type)
+		return thunder_uncore_ocx_tlk;
 	else
 		return NULL;
 }
@@ -306,6 +308,7 @@ static int __init thunder_uncore_init(void)
 	thunder_uncore_l2c_tad_setup();
 	thunder_uncore_l2c_cbc_setup();
 	thunder_uncore_lmc_setup();
+	thunder_uncore_ocx_tlk_setup();
 	return 0;
 }
 late_initcall(thunder_uncore_init);
diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
index f14f6be..78e95c7 100644
--- a/drivers/perf/uncore/uncore_cavium.h
+++ b/drivers/perf/uncore/uncore_cavium.h
@@ -10,6 +10,7 @@ enum uncore_type {
 	L2C_TAD_TYPE,
 	L2C_CBC_TYPE,
 	LMC_TYPE,
+	OCX_TLK_TYPE,
 };
 
 extern int thunder_uncore_version;
@@ -70,9 +71,11 @@ extern struct device_attribute format_attr_node;
 extern struct thunder_uncore *thunder_uncore_l2c_tad;
 extern struct thunder_uncore *thunder_uncore_l2c_cbc;
 extern struct thunder_uncore *thunder_uncore_lmc;
+extern struct thunder_uncore *thunder_uncore_ocx_tlk;
 extern struct pmu thunder_l2c_tad_pmu;
 extern struct pmu thunder_l2c_cbc_pmu;
 extern struct pmu thunder_lmc_pmu;
+extern struct pmu thunder_ocx_tlk_pmu;
 
 /* Prototypes */
 struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
@@ -89,3 +92,4 @@ ssize_t thunder_events_sysfs_show(struct device *dev,
 int thunder_uncore_l2c_tad_setup(void);
 int thunder_uncore_l2c_cbc_setup(void);
 int thunder_uncore_lmc_setup(void);
+int thunder_uncore_ocx_tlk_setup(void);
diff --git a/drivers/perf/uncore/uncore_cavium_ocx_tlk.c b/drivers/perf/uncore/uncore_cavium_ocx_tlk.c
new file mode 100644
index 0000000..02f1bc1
--- /dev/null
+++ b/drivers/perf/uncore/uncore_cavium_ocx_tlk.c
@@ -0,0 +1,380 @@
+/*
+ * Cavium Thunder uncore PMU support, OCX TLK counters.
+ *
+ * Copyright 2016 Cavium Inc.
+ * Author: Jan Glauber <jan.glauber@cavium.com>
+ */
+
+#include <linux/slab.h>
+#include <linux/perf_event.h>
+
+#include "uncore_cavium.h"
+
+#ifndef PCI_DEVICE_ID_THUNDER_OCX
+#define PCI_DEVICE_ID_THUNDER_OCX		0xa013
+#endif
+
+#define OCX_TLK_NR_UNITS			3
+#define OCX_TLK_UNIT_OFFSET			0x2000
+#define OCX_TLK_CONTROL_OFFSET			0x10040
+#define OCX_TLK_COUNTER_OFFSET			0x10400
+
+#define OCX_TLK_STAT_DISABLE			0
+#define OCX_TLK_STAT_ENABLE			1
+
+/* OCX TLK event list */
+#define OCX_TLK_EVENT_STAT_IDLE_CNT		0x00
+#define OCX_TLK_EVENT_STAT_DATA_CNT		0x01
+#define OCX_TLK_EVENT_STAT_SYNC_CNT		0x02
+#define OCX_TLK_EVENT_STAT_RETRY_CNT		0x03
+#define OCX_TLK_EVENT_STAT_ERR_CNT		0x04
+
+#define OCX_TLK_EVENT_STAT_MAT0_CNT		0x08
+#define OCX_TLK_EVENT_STAT_MAT1_CNT		0x09
+#define OCX_TLK_EVENT_STAT_MAT2_CNT		0x0a
+#define OCX_TLK_EVENT_STAT_MAT3_CNT		0x0b
+
+#define OCX_TLK_EVENT_STAT_VC0_CMD		0x10
+#define OCX_TLK_EVENT_STAT_VC1_CMD		0x11
+#define OCX_TLK_EVENT_STAT_VC2_CMD		0x12
+#define OCX_TLK_EVENT_STAT_VC3_CMD		0x13
+#define OCX_TLK_EVENT_STAT_VC4_CMD		0x14
+#define OCX_TLK_EVENT_STAT_VC5_CMD		0x15
+
+#define OCX_TLK_EVENT_STAT_VC0_PKT		0x20
+#define OCX_TLK_EVENT_STAT_VC1_PKT		0x21
+#define OCX_TLK_EVENT_STAT_VC2_PKT		0x22
+#define OCX_TLK_EVENT_STAT_VC3_PKT		0x23
+#define OCX_TLK_EVENT_STAT_VC4_PKT		0x24
+#define OCX_TLK_EVENT_STAT_VC5_PKT		0x25
+#define OCX_TLK_EVENT_STAT_VC6_PKT		0x26
+#define OCX_TLK_EVENT_STAT_VC7_PKT		0x27
+#define OCX_TLK_EVENT_STAT_VC8_PKT		0x28
+#define OCX_TLK_EVENT_STAT_VC9_PKT		0x29
+#define OCX_TLK_EVENT_STAT_VC10_PKT		0x2a
+#define OCX_TLK_EVENT_STAT_VC11_PKT		0x2b
+#define OCX_TLK_EVENT_STAT_VC12_PKT		0x2c
+#define OCX_TLK_EVENT_STAT_VC13_PKT		0x2d
+
+#define OCX_TLK_EVENT_STAT_VC0_CON		0x30
+#define OCX_TLK_EVENT_STAT_VC1_CON		0x31
+#define OCX_TLK_EVENT_STAT_VC2_CON		0x32
+#define OCX_TLK_EVENT_STAT_VC3_CON		0x33
+#define OCX_TLK_EVENT_STAT_VC4_CON		0x34
+#define OCX_TLK_EVENT_STAT_VC5_CON		0x35
+#define OCX_TLK_EVENT_STAT_VC6_CON		0x36
+#define OCX_TLK_EVENT_STAT_VC7_CON		0x37
+#define OCX_TLK_EVENT_STAT_VC8_CON		0x38
+#define OCX_TLK_EVENT_STAT_VC9_CON		0x39
+#define OCX_TLK_EVENT_STAT_VC10_CON		0x3a
+#define OCX_TLK_EVENT_STAT_VC11_CON		0x3b
+#define OCX_TLK_EVENT_STAT_VC12_CON		0x3c
+#define OCX_TLK_EVENT_STAT_VC13_CON		0x3d
+
+#define OCX_TLK_MAX_COUNTER			OCX_TLK_EVENT_STAT_VC13_CON
+#define OCX_TLK_NR_COUNTERS			OCX_TLK_MAX_COUNTER
+
+struct thunder_uncore *thunder_uncore_ocx_tlk;
+
+/*
+ * The OCX devices have a single device per node, therefore picking the
+ * first device from the list is correct.
+ */
+static inline void __iomem *map_offset(struct thunder_uncore_node *node,
+				       unsigned long addr, int offset, int nr)
+{
+	struct thunder_uncore_unit *unit;
+
+	unit = list_first_entry(&node->unit_list, struct thunder_uncore_unit,
+				entry);
+	return (void __iomem *) (addr + unit->map + nr * offset);
+}
+
+static void __iomem *map_offset_ocx_tlk(struct thunder_uncore_node *node,
+					unsigned long addr, int nr)
+{
+	return (void __iomem *) map_offset(node, addr, nr,
+					   OCX_TLK_UNIT_OFFSET);
+}
+
+/*
+ * Summarize counters across all TLK's. Different from the other uncore
+ * PMUs because all TLK's are on one PCI device.
+ */
+static void thunder_uncore_read_ocx_tlk(struct perf_event *event)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	u64 prev, new = 0;
+	s64 delta;
+	int i;
+
+	/*
+	 * No counter overflow interrupts so we do not
+	 * have to worry about prev_count changing on us.
+	 */
+
+	prev = local64_read(&hwc->prev_count);
+
+	/* read counter values from all units */
+	node = get_node(hwc->config, uncore);
+	for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+		new += readq(map_offset_ocx_tlk(node, hwc->event_base, i));
+
+	local64_set(&hwc->prev_count, new);
+	delta = new - prev;
+	local64_add(delta, &event->count);
+}
+
+static void thunder_uncore_start(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int i;
+
+	hwc->state = 0;
+
+	/* enable counters on all units */
+	node = get_node(hwc->config, uncore);
+	for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+		writeb(OCX_TLK_STAT_ENABLE,
+		       map_offset_ocx_tlk(node, hwc->config_base, i));
+
+	perf_event_update_userpage(event);
+}
+
+static void thunder_uncore_stop(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int i;
+
+	/* disable counters on all units */
+	node = get_node(hwc->config, uncore);
+	for (i = 0; i < OCX_TLK_NR_UNITS; i++)
+		writeb(OCX_TLK_STAT_DISABLE,
+		       map_offset_ocx_tlk(node, hwc->config_base, i));
+	hwc->state |= PERF_HES_STOPPED;
+
+	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
+		thunder_uncore_read_ocx_tlk(event);
+		hwc->state |= PERF_HES_UPTODATE;
+	}
+}
+
+static int thunder_uncore_add(struct perf_event *event, int flags)
+{
+	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
+	struct hw_perf_event *hwc = &event->hw;
+	struct thunder_uncore_node *node;
+	int id, i;
+
+	WARN_ON_ONCE(!uncore);
+	node = get_node(hwc->config, uncore);
+	id = get_id(hwc->config);
+
+	/* are we already assigned? */
+	if (hwc->idx != -1 && node->events[hwc->idx] == event)
+		goto out;
+
+	for (i = 0; i < node->num_counters; i++) {
+		if (node->events[i] == event) {
+			hwc->idx = i;
+			goto out;
+		}
+	}
+
+	/* counters are 1:1 */
+	hwc->idx = -1;
+	if (cmpxchg(&node->events[id], NULL, event) == NULL)
+		hwc->idx = id;
+
+out:
+	if (hwc->idx == -1)
+		return -EBUSY;
+
+	hwc->config_base = 0;
+	hwc->event_base = OCX_TLK_COUNTER_OFFSET - OCX_TLK_CONTROL_OFFSET +
+			hwc->idx * sizeof(unsigned long long);
+	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+	if (flags & PERF_EF_START)
+		thunder_uncore_start(event, PERF_EF_RELOAD);
+	return 0;
+}
+
+PMU_FORMAT_ATTR(event, "config:0-5");
+
+static struct attribute *thunder_ocx_tlk_format_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_node.attr,
+	NULL,
+};
+
+static struct attribute_group thunder_ocx_tlk_format_group = {
+	.name = "format",
+	.attrs = thunder_ocx_tlk_format_attr,
+};
+
+EVENT_ATTR(idle_cnt,	OCX_TLK_EVENT_STAT_IDLE_CNT);
+EVENT_ATTR(data_cnt,	OCX_TLK_EVENT_STAT_DATA_CNT);
+EVENT_ATTR(sync_cnt,	OCX_TLK_EVENT_STAT_SYNC_CNT);
+EVENT_ATTR(retry_cnt,	OCX_TLK_EVENT_STAT_RETRY_CNT);
+EVENT_ATTR(err_cnt,	OCX_TLK_EVENT_STAT_ERR_CNT);
+EVENT_ATTR(mat0_cnt,	OCX_TLK_EVENT_STAT_MAT0_CNT);
+EVENT_ATTR(mat1_cnt,	OCX_TLK_EVENT_STAT_MAT1_CNT);
+EVENT_ATTR(mat2_cnt,	OCX_TLK_EVENT_STAT_MAT2_CNT);
+EVENT_ATTR(mat3_cnt,	OCX_TLK_EVENT_STAT_MAT3_CNT);
+EVENT_ATTR(vc0_cmd,	OCX_TLK_EVENT_STAT_VC0_CMD);
+EVENT_ATTR(vc1_cmd,	OCX_TLK_EVENT_STAT_VC1_CMD);
+EVENT_ATTR(vc2_cmd,	OCX_TLK_EVENT_STAT_VC2_CMD);
+EVENT_ATTR(vc3_cmd,	OCX_TLK_EVENT_STAT_VC3_CMD);
+EVENT_ATTR(vc4_cmd,	OCX_TLK_EVENT_STAT_VC4_CMD);
+EVENT_ATTR(vc5_cmd,	OCX_TLK_EVENT_STAT_VC5_CMD);
+EVENT_ATTR(vc0_pkt,	OCX_TLK_EVENT_STAT_VC0_PKT);
+EVENT_ATTR(vc1_pkt,	OCX_TLK_EVENT_STAT_VC1_PKT);
+EVENT_ATTR(vc2_pkt,	OCX_TLK_EVENT_STAT_VC2_PKT);
+EVENT_ATTR(vc3_pkt,	OCX_TLK_EVENT_STAT_VC3_PKT);
+EVENT_ATTR(vc4_pkt,	OCX_TLK_EVENT_STAT_VC4_PKT);
+EVENT_ATTR(vc5_pkt,	OCX_TLK_EVENT_STAT_VC5_PKT);
+EVENT_ATTR(vc6_pkt,	OCX_TLK_EVENT_STAT_VC6_PKT);
+EVENT_ATTR(vc7_pkt,	OCX_TLK_EVENT_STAT_VC7_PKT);
+EVENT_ATTR(vc8_pkt,	OCX_TLK_EVENT_STAT_VC8_PKT);
+EVENT_ATTR(vc9_pkt,	OCX_TLK_EVENT_STAT_VC9_PKT);
+EVENT_ATTR(vc10_pkt,	OCX_TLK_EVENT_STAT_VC10_PKT);
+EVENT_ATTR(vc11_pkt,	OCX_TLK_EVENT_STAT_VC11_PKT);
+EVENT_ATTR(vc12_pkt,	OCX_TLK_EVENT_STAT_VC12_PKT);
+EVENT_ATTR(vc13_pkt,	OCX_TLK_EVENT_STAT_VC13_PKT);
+EVENT_ATTR(vc0_con,	OCX_TLK_EVENT_STAT_VC0_CON);
+EVENT_ATTR(vc1_con,	OCX_TLK_EVENT_STAT_VC1_CON);
+EVENT_ATTR(vc2_con,	OCX_TLK_EVENT_STAT_VC2_CON);
+EVENT_ATTR(vc3_con,	OCX_TLK_EVENT_STAT_VC3_CON);
+EVENT_ATTR(vc4_con,	OCX_TLK_EVENT_STAT_VC4_CON);
+EVENT_ATTR(vc5_con,	OCX_TLK_EVENT_STAT_VC5_CON);
+EVENT_ATTR(vc6_con,	OCX_TLK_EVENT_STAT_VC6_CON);
+EVENT_ATTR(vc7_con,	OCX_TLK_EVENT_STAT_VC7_CON);
+EVENT_ATTR(vc8_con,	OCX_TLK_EVENT_STAT_VC8_CON);
+EVENT_ATTR(vc9_con,	OCX_TLK_EVENT_STAT_VC9_CON);
+EVENT_ATTR(vc10_con,	OCX_TLK_EVENT_STAT_VC10_CON);
+EVENT_ATTR(vc11_con,	OCX_TLK_EVENT_STAT_VC11_CON);
+EVENT_ATTR(vc12_con,	OCX_TLK_EVENT_STAT_VC12_CON);
+EVENT_ATTR(vc13_con,	OCX_TLK_EVENT_STAT_VC13_CON);
+
+static struct attribute *thunder_ocx_tlk_events_attr[] = {
+	EVENT_PTR(idle_cnt),
+	EVENT_PTR(data_cnt),
+	EVENT_PTR(sync_cnt),
+	EVENT_PTR(retry_cnt),
+	EVENT_PTR(err_cnt),
+	EVENT_PTR(mat0_cnt),
+	EVENT_PTR(mat1_cnt),
+	EVENT_PTR(mat2_cnt),
+	EVENT_PTR(mat3_cnt),
+	EVENT_PTR(vc0_cmd),
+	EVENT_PTR(vc1_cmd),
+	EVENT_PTR(vc2_cmd),
+	EVENT_PTR(vc3_cmd),
+	EVENT_PTR(vc4_cmd),
+	EVENT_PTR(vc5_cmd),
+	EVENT_PTR(vc0_pkt),
+	EVENT_PTR(vc1_pkt),
+	EVENT_PTR(vc2_pkt),
+	EVENT_PTR(vc3_pkt),
+	EVENT_PTR(vc4_pkt),
+	EVENT_PTR(vc5_pkt),
+	EVENT_PTR(vc6_pkt),
+	EVENT_PTR(vc7_pkt),
+	EVENT_PTR(vc8_pkt),
+	EVENT_PTR(vc9_pkt),
+	EVENT_PTR(vc10_pkt),
+	EVENT_PTR(vc11_pkt),
+	EVENT_PTR(vc12_pkt),
+	EVENT_PTR(vc13_pkt),
+	EVENT_PTR(vc0_con),
+	EVENT_PTR(vc1_con),
+	EVENT_PTR(vc2_con),
+	EVENT_PTR(vc3_con),
+	EVENT_PTR(vc4_con),
+	EVENT_PTR(vc5_con),
+	EVENT_PTR(vc6_con),
+	EVENT_PTR(vc7_con),
+	EVENT_PTR(vc8_con),
+	EVENT_PTR(vc9_con),
+	EVENT_PTR(vc10_con),
+	EVENT_PTR(vc11_con),
+	EVENT_PTR(vc12_con),
+	EVENT_PTR(vc13_con),
+	NULL,
+};
+
+static struct attribute_group thunder_ocx_tlk_events_group = {
+	.name = "events",
+	.attrs = thunder_ocx_tlk_events_attr,
+};
+
+static const struct attribute_group *thunder_ocx_tlk_attr_groups[] = {
+	&thunder_uncore_attr_group,
+	&thunder_ocx_tlk_format_group,
+	&thunder_ocx_tlk_events_group,
+	NULL,
+};
+
+struct pmu thunder_ocx_tlk_pmu = {
+	.attr_groups	= thunder_ocx_tlk_attr_groups,
+	.name		= "thunder_ocx_tlk",
+	.event_init	= thunder_uncore_event_init,
+	.add		= thunder_uncore_add,
+	.del		= thunder_uncore_del,
+	.start		= thunder_uncore_start,
+	.stop		= thunder_uncore_stop,
+	.read		= thunder_uncore_read_ocx_tlk,
+};
+
+static int event_valid(u64 config)
+{
+	if (config <= OCX_TLK_EVENT_STAT_ERR_CNT ||
+	    (config >= OCX_TLK_EVENT_STAT_MAT0_CNT &&
+	     config <= OCX_TLK_EVENT_STAT_MAT3_CNT) ||
+	    (config >= OCX_TLK_EVENT_STAT_VC0_CMD &&
+	     config <= OCX_TLK_EVENT_STAT_VC5_CMD) ||
+	    (config >= OCX_TLK_EVENT_STAT_VC0_PKT &&
+	     config <= OCX_TLK_EVENT_STAT_VC13_PKT) ||
+	    (config >= OCX_TLK_EVENT_STAT_VC0_CON &&
+	     config <= OCX_TLK_EVENT_STAT_VC13_CON))
+		return 1;
+	else
+		return 0;
+}
+
+int __init thunder_uncore_ocx_tlk_setup(void)
+{
+	int ret;
+
+	thunder_uncore_ocx_tlk = kzalloc(sizeof(struct thunder_uncore),
+					 GFP_KERNEL);
+	if (!thunder_uncore_ocx_tlk) {
+		ret = -ENOMEM;
+		goto fail_nomem;
+	}
+
+	ret = thunder_uncore_setup(thunder_uncore_ocx_tlk,
+				   PCI_DEVICE_ID_THUNDER_OCX,
+				   OCX_TLK_CONTROL_OFFSET,
+				   OCX_TLK_UNIT_OFFSET * OCX_TLK_NR_UNITS,
+				   &thunder_ocx_tlk_pmu,
+				   OCX_TLK_NR_COUNTERS);
+	if (ret)
+		goto fail;
+
+	thunder_uncore_ocx_tlk->type = OCX_TLK_TYPE;
+	thunder_uncore_ocx_tlk->event_valid = event_valid;
+	return 0;
+
+fail:
+	kfree(thunder_uncore_ocx_tlk);
+fail_nomem:
+	return ret;
+}
-- 
1.9.1

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-03-09 16:21 ` Jan Glauber
@ 2016-04-04 12:19   ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-04 12:19 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-kernel, linux-arm-kernel

Hi Mark,

can you have a look at these patches?

Thanks,
Jan

On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> This patch series provides access to various counters on the ThunderX SOC.
> 
> For details of the uncore implementation see patch #1.
> 
> Patches #2-5 add the various ThunderX specific PMUs.
> 
> As suggested I've put the files under drivers/perf/uncore. I would
> prefer this location over drivers/bus because not all of the uncore
> drivers are bus related.
> 
> Changes to v1:
> - Added NUMA support
> - Fixed CPU hotplug by pmu migration
> - Moved files to drivers/perf/uncore
> - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
> - improved comments abount overflow interrupts
> - removed max device limit
> - trimmed include files
> 
> Feedback welcome!
> Jan
> 
> -------------------------------------------------
> 
> Jan Glauber (5):
>   arm64/perf: Basic uncore counter support for Cavium ThunderX
>   arm64/perf: Cavium ThunderX L2C TAD uncore support
>   arm64/perf: Cavium ThunderX L2C CBC uncore support
>   arm64/perf: Cavium ThunderX LMC uncore support
>   arm64/perf: Cavium ThunderX OCX TLK uncore support
> 
>  drivers/perf/Makefile                       |   1 +
>  drivers/perf/uncore/Makefile                |   5 +
>  drivers/perf/uncore/uncore_cavium.c         | 314 +++++++++++++++
>  drivers/perf/uncore/uncore_cavium.h         |  95 +++++
>  drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
>  drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
>  drivers/perf/uncore/uncore_cavium_lmc.c     | 196 +++++++++
>  drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
>  8 files changed, 1828 insertions(+)
>  create mode 100644 drivers/perf/uncore/Makefile
>  create mode 100644 drivers/perf/uncore/uncore_cavium.c
>  create mode 100644 drivers/perf/uncore/uncore_cavium.h
>  create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
>  create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
>  create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
>  create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
> 
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-04 12:19   ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-04 12:19 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Mark,

can you have a look at these patches?

Thanks,
Jan

On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> This patch series provides access to various counters on the ThunderX SOC.
> 
> For details of the uncore implementation see patch #1.
> 
> Patches #2-5 add the various ThunderX specific PMUs.
> 
> As suggested I've put the files under drivers/perf/uncore. I would
> prefer this location over drivers/bus because not all of the uncore
> drivers are bus related.
> 
> Changes to v1:
> - Added NUMA support
> - Fixed CPU hotplug by pmu migration
> - Moved files to drivers/perf/uncore
> - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
> - improved comments abount overflow interrupts
> - removed max device limit
> - trimmed include files
> 
> Feedback welcome!
> Jan
> 
> -------------------------------------------------
> 
> Jan Glauber (5):
>   arm64/perf: Basic uncore counter support for Cavium ThunderX
>   arm64/perf: Cavium ThunderX L2C TAD uncore support
>   arm64/perf: Cavium ThunderX L2C CBC uncore support
>   arm64/perf: Cavium ThunderX LMC uncore support
>   arm64/perf: Cavium ThunderX OCX TLK uncore support
> 
>  drivers/perf/Makefile                       |   1 +
>  drivers/perf/uncore/Makefile                |   5 +
>  drivers/perf/uncore/uncore_cavium.c         | 314 +++++++++++++++
>  drivers/perf/uncore/uncore_cavium.h         |  95 +++++
>  drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
>  drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
>  drivers/perf/uncore/uncore_cavium_lmc.c     | 196 +++++++++
>  drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
>  8 files changed, 1828 insertions(+)
>  create mode 100644 drivers/perf/uncore/Makefile
>  create mode 100644 drivers/perf/uncore/uncore_cavium.c
>  create mode 100644 drivers/perf/uncore/uncore_cavium.h
>  create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
>  create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
>  create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
>  create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
> 
> -- 
> 1.9.1

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
       [not found] ` <CAEiAFz3eCsX3VoNus_Rq+En5zuB8fAxNCbC3ktw2NqLKwC=_kA@mail.gmail.com>
@ 2016-04-19 10:35     ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-19 10:35 UTC (permalink / raw)
  To: Mark Rutland, Will Deacon; +Cc: linux-kernel, linux-arm-kernel

Mark,

are these patches still queued or should I repost them?

--Jan

On Mon, Apr 04, 2016 at 01:03:13PM +0200, Jan Glauber wrote:
> Hi Mark,
> 
> can you have a look at these patches?
> 
> Thanks,
> Jan
> 
> 2016-03-09 17:21 GMT+01:00 Jan Glauber <jglauber@cavium.com>:
> 
>     This patch series provides access to various counters on the ThunderX SOC.
> 
>     For details of the uncore implementation see patch #1.
> 
>     Patches #2-5 add the various ThunderX specific PMUs.
> 
>     As suggested I've put the files under drivers/perf/uncore. I would
>     prefer this location over drivers/bus because not all of the uncore
>     drivers are bus related.
> 
>     Changes to v1:
>     - Added NUMA support
>     - Fixed CPU hotplug by pmu migration
>     - Moved files to drivers/perf/uncore
>     - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
>     - improved comments abount overflow interrupts
>     - removed max device limit
>     - trimmed include files
> 
>     Feedback welcome!
>     Jan
> 
>     -------------------------------------------------
> 
>     Jan Glauber (5):
>       arm64/perf: Basic uncore counter support for Cavium ThunderX
>       arm64/perf: Cavium ThunderX L2C TAD uncore support
>       arm64/perf: Cavium ThunderX L2C CBC uncore support
>       arm64/perf: Cavium ThunderX LMC uncore support
>       arm64/perf: Cavium ThunderX OCX TLK uncore support
> 
>      drivers/perf/Makefile                       |   1 +
>      drivers/perf/uncore/Makefile                |   5 +
>      drivers/perf/uncore/uncore_cavium.c         | 314 +++++++++++++++
>      drivers/perf/uncore/uncore_cavium.h         |  95 +++++
>      drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
>      drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600
>     ++++++++++++++++++++++++++++
>      drivers/perf/uncore/uncore_cavium_lmc.c     | 196 +++++++++
>      drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
>      8 files changed, 1828 insertions(+)
>      create mode 100644 drivers/perf/uncore/Makefile
>      create mode 100644 drivers/perf/uncore/uncore_cavium.c
>      create mode 100644 drivers/perf/uncore/uncore_cavium.h
>      create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
>      create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
>      create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
>      create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
>    
>     --
>     1.9.1
> 
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-19 10:35     ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-19 10:35 UTC (permalink / raw)
  To: linux-arm-kernel

Mark,

are these patches still queued or should I repost them?

--Jan

On Mon, Apr 04, 2016 at 01:03:13PM +0200, Jan Glauber wrote:
> Hi Mark,
> 
> can you have a look at these patches?
> 
> Thanks,
> Jan
> 
> 2016-03-09 17:21 GMT+01:00 Jan Glauber <jglauber@cavium.com>:
> 
>     This patch series provides access to various counters on the ThunderX SOC.
> 
>     For details of the uncore implementation see patch #1.
> 
>     Patches #2-5 add the various ThunderX specific PMUs.
> 
>     As suggested I've put the files under drivers/perf/uncore. I would
>     prefer this location over drivers/bus because not all of the uncore
>     drivers are bus related.
> 
>     Changes to v1:
>     - Added NUMA support
>     - Fixed CPU hotplug by pmu migration
>     - Moved files to drivers/perf/uncore
>     - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
>     - improved comments abount overflow interrupts
>     - removed max device limit
>     - trimmed include files
> 
>     Feedback welcome!
>     Jan
> 
>     -------------------------------------------------
> 
>     Jan Glauber (5):
>     ? arm64/perf: Basic uncore counter support for Cavium ThunderX
>     ? arm64/perf: Cavium ThunderX L2C TAD uncore support
>     ? arm64/perf: Cavium ThunderX L2C CBC uncore support
>     ? arm64/perf: Cavium ThunderX LMC uncore support
>     ? arm64/perf: Cavium ThunderX OCX TLK uncore support
> 
>     ?drivers/perf/Makefile? ? ? ? ? ? ? ? ? ? ? ?|? ?1 +
>     ?drivers/perf/uncore/Makefile? ? ? ? ? ? ? ? |? ?5 +
>     ?drivers/perf/uncore/uncore_cavium.c? ? ? ? ?| 314 +++++++++++++++
>     ?drivers/perf/uncore/uncore_cavium.h? ? ? ? ?|? 95 +++++
>     ?drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
>     ?drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600
>     ++++++++++++++++++++++++++++
>     ?drivers/perf/uncore/uncore_cavium_lmc.c? ? ?| 196 +++++++++
>     ?drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
>     ?8 files changed, 1828 insertions(+)
>     ?create mode 100644 drivers/perf/uncore/Makefile
>     ?create mode 100644 drivers/perf/uncore/uncore_cavium.c
>     ?create mode 100644 drivers/perf/uncore/uncore_cavium.h
>     ?create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
>     ?create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
>     ?create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
>     ?create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
>    
>     --
>     1.9.1
> 
> 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/5] arm64/perf: Basic uncore counter support for Cavium ThunderX
  2016-03-09 16:21   ` Jan Glauber
@ 2016-04-19 15:06     ` Mark Rutland
  -1 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-19 15:06 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Will Deacon, linux-kernel, linux-arm-kernel

On Wed, Mar 09, 2016 at 05:21:03PM +0100, Jan Glauber wrote:
> Provide "uncore" facilities for different non-CPU performance
> counter units. Based on Intel/AMD uncore pmu support.
> 
> The uncore drivers cover quite different functionality including
> L2 Cache, memory controllers and interconnects.
> 
> The uncore PMUs can be found under /sys/bus/event_source/devices.
> All counters are exported via sysfs in the corresponding events
> files under the PMU directory so the perf tool can list the event names.
> 
> There are some points that are special in this implementation:
> 
> 1) The PMU detection relies on PCI device detection. If a
>    matching PCI device is found the PMU is created. The code can deal
>    with multiple units of the same type, e.g. more than one memory
>    controller.
>    Note: There is also a CPUID check to determine the CPU variant,
>    this is needed to support different hardware versions that use
>    the same PCI IDs.
> 
> 2) Counters are summarized across different units of the same type
>    on one NUMA node.
>    For instance L2C TAD 0..7 are presented as a single counter
>    (adding the values from TAD 0 to 7). Although losing the ability
>    to read a single value the merged values are easier to use.

Merging within a NUMA node, but no further seems a little arbitrary.

> 3) NUMA support. The device node id is used to group devices by node
>    so counters on one node can be merged. The NUMA node can be selected
>    via a new sysfs node attribute.
>    Without NUMA support all devices will be on node 0.

It doesn't seem great that this depends on kernel configuration (which
is independent of HW configuration). It seems confusing for the user,
and fragile.

Do we not have access to another way of grouping cores (e.g. a socket
ID), that's independent of kernel configuration? That seems to be how
the x86 uncore PMUs are handled.

If we don't have that information, it really feels like we need
additional info from FW (which would also solve the CPUID issue with
point 1), or this is likely to be very fragile.

> +void thunder_uncore_read(struct perf_event *event)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore_unit *unit;
> +	u64 prev, new = 0;
> +	s64 delta;
> +
> +	node = get_node(hwc->config, uncore);
> +
> +	/*
> +	 * No counter overflow interrupts so we do not
> +	 * have to worry about prev_count changing on us.
> +	 */
> +	prev = local64_read(&hwc->prev_count);
> +
> +	/* read counter values from all units on the node */
> +	list_for_each_entry(unit, &node->unit_list, entry)
> +		new += readq(hwc->event_base + unit->map);
> +
> +	local64_set(&hwc->prev_count, new);
> +	delta = new - prev;
> +	local64_add(delta, &event->count);
> +}
> +
> +void thunder_uncore_del(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	int i;
> +
> +	event->pmu->stop(event, PERF_EF_UPDATE);
> +
> +	/*
> +	 * For programmable counters we need to check where we installed it.
> +	 * To keep this function generic always test the more complicated
> +	 * case (free running counters won't need the loop).
> +	 */
> +	node = get_node(hwc->config, uncore);
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (cmpxchg(&node->events[i], event, NULL) == event)
> +			break;
> +	}
> +	hwc->idx = -1;
> +}

It's very difficult to know what's going on here with the lack of a
corresponding *_add function. Is there any reason there is not a common
implementation, at least for the shared logic?

Similarly, it's difficult to know what state the read function is
expecting (e.g. are counters always initialised to 0 to start with)?

> +int thunder_uncore_event_init(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore *uncore;
> +
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;
> +
> +	/* we do not support sampling */
> +	if (is_sampling_event(event))
> +		return -EINVAL;
> +
> +	/* counters do not have these bits */
> +	if (event->attr.exclude_user	||
> +	    event->attr.exclude_kernel	||
> +	    event->attr.exclude_host	||
> +	    event->attr.exclude_guest	||
> +	    event->attr.exclude_hv	||
> +	    event->attr.exclude_idle)
> +		return -EINVAL;
> +
> +	/* counters are 64 bit wide and without overflow interrupts */
> +

It would be good to describe the implications of this; otherwise it
seems like a floating comment.

> +	uncore = event_to_thunder_uncore(event);
> +	if (!uncore)
> +		return -ENODEV;
> +	if (!uncore->event_valid(event->attr.config & UNCORE_EVENT_ID_MASK))
> +		return -EINVAL;
> +
> +	/* check NUMA node */
> +	node = get_node(event->attr.config, uncore);

As above, I don't think using Linux NUMA node IDs is a good idea,
especially for a user-facing ABI.

> +	if (!node) {
> +		pr_debug("Invalid numa node selected\n");
> +		return -EINVAL;
> +	}
> +
> +	hwc->config = event->attr.config;
> +	hwc->idx = -1;
> +	return 0;
> +}

What about the CPU handling?

Where do you verify that cpu != -1, and assign the event to a particular
CPU prior to the pmu::add callback? That should be common to all of your
uncore PMUs, and should live here.

> +static ssize_t node_show(struct device *dev, struct device_attribute *attr, char *page)
> +{
> +	if (NODES_SHIFT)
> +		return sprintf(page, "config:16-%d\n", 16 + NODES_SHIFT - 1);
> +	else
> +		return sprintf(page, "config:16\n");
> +}

I'm not keen on this depending on the kernel configuration.

> +static int thunder_uncore_pmu_cpu_notifier(struct notifier_block *nb,
> +					   unsigned long action, void *data)
> +{
> +	struct thunder_uncore *uncore = container_of(nb, struct thunder_uncore, cpu_nb);
> +	int new_cpu, old_cpu = (long) data;
> +
> +	switch (action & ~CPU_TASKS_FROZEN) {
> +	case CPU_DOWN_PREPARE:
> +		if (!cpumask_test_and_clear_cpu(old_cpu, &thunder_active_mask))
> +			break;
> +		new_cpu = cpumask_any_but(cpu_online_mask, old_cpu);

Above it was mentioned that events are groups per node/socket. So surely
it doesn't make sens to migrate this to any arbitrary CPU, but only CPUs
in the same node?

If I have active events for node 0, what happens when I hotplug out the
last CPU for that node (but have CPUs online in node 1)?

Is it guaranteed that power is retained for the device?

> +		if (new_cpu >= nr_cpu_ids)
> +			break;
> +		perf_pmu_migrate_context(uncore->pmu, old_cpu, new_cpu);
> +		cpumask_set_cpu(new_cpu, &thunder_active_mask);
> +		break;
> +	default:
> +		break;
> +	}
> +	return NOTIFY_OK;
> +}
> +
> +static struct thunder_uncore_node *alloc_node(struct thunder_uncore *uncore, int node_id, int counters)
> +{
> +	struct thunder_uncore_node *node;
> +
> +	node = kzalloc(sizeof(struct thunder_uncore_node), GFP_KERNEL);

Use:
	node = kzalloc(sizeof(*node), GFP_KERNEL);

> +int __init thunder_uncore_setup(struct thunder_uncore *uncore, int device_id,
> +			 unsigned long offset, unsigned long size,
> +			 struct pmu *pmu, int counters)
> +{
> +	struct thunder_uncore_unit  *unit, *tmp;
> +	struct thunder_uncore_node *node;
> +	struct pci_dev *pdev = NULL;
> +	int ret, node_id, found = 0;
> +
> +	/* detect PCI devices */
> +	do {
> +		pdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, device_id, pdev);
> +		if (!pdev)
> +			break;

the loop would look cleaner like:

	unsigned int vendor_id = PCI_VENDOR_ID_CAVIUM;

	while ((pdev = pci_get_device(vendor_id, device_id, pdev))) {

		...

	}

> +
> +		node_id = dev_to_node(&pdev->dev);
> +		/*
> +		 * -1 without NUMA, set to 0 because we always have at
> +		 *  least node 0.
> +		 */
> +		if (node_id < 0)
> +			node_id = 0;

Again, this seems fragile to me. I am very much not keen on this
behaviour varying based on a logically unrelated kernel configuration
option.

> +
> +		/* allocate node if necessary */
> +		if (!uncore->nodes[node_id])
> +			uncore->nodes[node_id] = alloc_node(uncore, node_id, counters);
> +
> +		node = uncore->nodes[node_id];
> +		if (!node) {
> +			ret = -ENOMEM;
> +			goto fail;
> +		}
> +
> +		unit = kzalloc(sizeof(struct thunder_uncore_unit), GFP_KERNEL);

Use:
	unit = kzalloc(sizeof(*unit), GFP_KERNEL)

> +	/*
> +	 * perf PMU is CPU dependent in difference to our uncore devices.
> +	 * Just pick a CPU and migrate away if it goes offline.
> +	 */
> +	cpumask_set_cpu(smp_processor_id(), &thunder_active_mask);

The current CPU is not guaranteed to be in the same node, no?

My comments earlier w.r.t. migration apply here too.

> +
> +	uncore->cpu_nb.notifier_call = thunder_uncore_pmu_cpu_notifier;
> +	uncore->cpu_nb.priority = CPU_PRI_PERF + 1;
> +	ret = register_cpu_notifier(&uncore->cpu_nb);
> +	if (ret)
> +		goto fail;
> +
> +	ret = perf_pmu_register(pmu, pmu->name, -1);
> +	if (ret)
> +		goto fail_pmu;
> +
> +	uncore->pmu = pmu;

Typically, the data related to the PMU is put in a struct which wraps
the struct pmu. That allows you to map either way using container_of.

Is there a particular reason for thunder_uncore to not contain the
struct PMU, rather than a pointer to it?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/5] arm64/perf: Basic uncore counter support for Cavium ThunderX
@ 2016-04-19 15:06     ` Mark Rutland
  0 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-19 15:06 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 09, 2016 at 05:21:03PM +0100, Jan Glauber wrote:
> Provide "uncore" facilities for different non-CPU performance
> counter units. Based on Intel/AMD uncore pmu support.
> 
> The uncore drivers cover quite different functionality including
> L2 Cache, memory controllers and interconnects.
> 
> The uncore PMUs can be found under /sys/bus/event_source/devices.
> All counters are exported via sysfs in the corresponding events
> files under the PMU directory so the perf tool can list the event names.
> 
> There are some points that are special in this implementation:
> 
> 1) The PMU detection relies on PCI device detection. If a
>    matching PCI device is found the PMU is created. The code can deal
>    with multiple units of the same type, e.g. more than one memory
>    controller.
>    Note: There is also a CPUID check to determine the CPU variant,
>    this is needed to support different hardware versions that use
>    the same PCI IDs.
> 
> 2) Counters are summarized across different units of the same type
>    on one NUMA node.
>    For instance L2C TAD 0..7 are presented as a single counter
>    (adding the values from TAD 0 to 7). Although losing the ability
>    to read a single value the merged values are easier to use.

Merging within a NUMA node, but no further seems a little arbitrary.

> 3) NUMA support. The device node id is used to group devices by node
>    so counters on one node can be merged. The NUMA node can be selected
>    via a new sysfs node attribute.
>    Without NUMA support all devices will be on node 0.

It doesn't seem great that this depends on kernel configuration (which
is independent of HW configuration). It seems confusing for the user,
and fragile.

Do we not have access to another way of grouping cores (e.g. a socket
ID), that's independent of kernel configuration? That seems to be how
the x86 uncore PMUs are handled.

If we don't have that information, it really feels like we need
additional info from FW (which would also solve the CPUID issue with
point 1), or this is likely to be very fragile.

> +void thunder_uncore_read(struct perf_event *event)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore_unit *unit;
> +	u64 prev, new = 0;
> +	s64 delta;
> +
> +	node = get_node(hwc->config, uncore);
> +
> +	/*
> +	 * No counter overflow interrupts so we do not
> +	 * have to worry about prev_count changing on us.
> +	 */
> +	prev = local64_read(&hwc->prev_count);
> +
> +	/* read counter values from all units on the node */
> +	list_for_each_entry(unit, &node->unit_list, entry)
> +		new += readq(hwc->event_base + unit->map);
> +
> +	local64_set(&hwc->prev_count, new);
> +	delta = new - prev;
> +	local64_add(delta, &event->count);
> +}
> +
> +void thunder_uncore_del(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	int i;
> +
> +	event->pmu->stop(event, PERF_EF_UPDATE);
> +
> +	/*
> +	 * For programmable counters we need to check where we installed it.
> +	 * To keep this function generic always test the more complicated
> +	 * case (free running counters won't need the loop).
> +	 */
> +	node = get_node(hwc->config, uncore);
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (cmpxchg(&node->events[i], event, NULL) == event)
> +			break;
> +	}
> +	hwc->idx = -1;
> +}

It's very difficult to know what's going on here with the lack of a
corresponding *_add function. Is there any reason there is not a common
implementation, at least for the shared logic?

Similarly, it's difficult to know what state the read function is
expecting (e.g. are counters always initialised to 0 to start with)?

> +int thunder_uncore_event_init(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore *uncore;
> +
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;
> +
> +	/* we do not support sampling */
> +	if (is_sampling_event(event))
> +		return -EINVAL;
> +
> +	/* counters do not have these bits */
> +	if (event->attr.exclude_user	||
> +	    event->attr.exclude_kernel	||
> +	    event->attr.exclude_host	||
> +	    event->attr.exclude_guest	||
> +	    event->attr.exclude_hv	||
> +	    event->attr.exclude_idle)
> +		return -EINVAL;
> +
> +	/* counters are 64 bit wide and without overflow interrupts */
> +

It would be good to describe the implications of this; otherwise it
seems like a floating comment.

> +	uncore = event_to_thunder_uncore(event);
> +	if (!uncore)
> +		return -ENODEV;
> +	if (!uncore->event_valid(event->attr.config & UNCORE_EVENT_ID_MASK))
> +		return -EINVAL;
> +
> +	/* check NUMA node */
> +	node = get_node(event->attr.config, uncore);

As above, I don't think using Linux NUMA node IDs is a good idea,
especially for a user-facing ABI.

> +	if (!node) {
> +		pr_debug("Invalid numa node selected\n");
> +		return -EINVAL;
> +	}
> +
> +	hwc->config = event->attr.config;
> +	hwc->idx = -1;
> +	return 0;
> +}

What about the CPU handling?

Where do you verify that cpu != -1, and assign the event to a particular
CPU prior to the pmu::add callback? That should be common to all of your
uncore PMUs, and should live here.

> +static ssize_t node_show(struct device *dev, struct device_attribute *attr, char *page)
> +{
> +	if (NODES_SHIFT)
> +		return sprintf(page, "config:16-%d\n", 16 + NODES_SHIFT - 1);
> +	else
> +		return sprintf(page, "config:16\n");
> +}

I'm not keen on this depending on the kernel configuration.

> +static int thunder_uncore_pmu_cpu_notifier(struct notifier_block *nb,
> +					   unsigned long action, void *data)
> +{
> +	struct thunder_uncore *uncore = container_of(nb, struct thunder_uncore, cpu_nb);
> +	int new_cpu, old_cpu = (long) data;
> +
> +	switch (action & ~CPU_TASKS_FROZEN) {
> +	case CPU_DOWN_PREPARE:
> +		if (!cpumask_test_and_clear_cpu(old_cpu, &thunder_active_mask))
> +			break;
> +		new_cpu = cpumask_any_but(cpu_online_mask, old_cpu);

Above it was mentioned that events are groups per node/socket. So surely
it doesn't make sens to migrate this to any arbitrary CPU, but only CPUs
in the same node?

If I have active events for node 0, what happens when I hotplug out the
last CPU for that node (but have CPUs online in node 1)?

Is it guaranteed that power is retained for the device?

> +		if (new_cpu >= nr_cpu_ids)
> +			break;
> +		perf_pmu_migrate_context(uncore->pmu, old_cpu, new_cpu);
> +		cpumask_set_cpu(new_cpu, &thunder_active_mask);
> +		break;
> +	default:
> +		break;
> +	}
> +	return NOTIFY_OK;
> +}
> +
> +static struct thunder_uncore_node *alloc_node(struct thunder_uncore *uncore, int node_id, int counters)
> +{
> +	struct thunder_uncore_node *node;
> +
> +	node = kzalloc(sizeof(struct thunder_uncore_node), GFP_KERNEL);

Use:
	node = kzalloc(sizeof(*node), GFP_KERNEL);

> +int __init thunder_uncore_setup(struct thunder_uncore *uncore, int device_id,
> +			 unsigned long offset, unsigned long size,
> +			 struct pmu *pmu, int counters)
> +{
> +	struct thunder_uncore_unit  *unit, *tmp;
> +	struct thunder_uncore_node *node;
> +	struct pci_dev *pdev = NULL;
> +	int ret, node_id, found = 0;
> +
> +	/* detect PCI devices */
> +	do {
> +		pdev = pci_get_device(PCI_VENDOR_ID_CAVIUM, device_id, pdev);
> +		if (!pdev)
> +			break;

the loop would look cleaner like:

	unsigned int vendor_id = PCI_VENDOR_ID_CAVIUM;

	while ((pdev = pci_get_device(vendor_id, device_id, pdev))) {

		...

	}

> +
> +		node_id = dev_to_node(&pdev->dev);
> +		/*
> +		 * -1 without NUMA, set to 0 because we always have at
> +		 *  least node 0.
> +		 */
> +		if (node_id < 0)
> +			node_id = 0;

Again, this seems fragile to me. I am very much not keen on this
behaviour varying based on a logically unrelated kernel configuration
option.

> +
> +		/* allocate node if necessary */
> +		if (!uncore->nodes[node_id])
> +			uncore->nodes[node_id] = alloc_node(uncore, node_id, counters);
> +
> +		node = uncore->nodes[node_id];
> +		if (!node) {
> +			ret = -ENOMEM;
> +			goto fail;
> +		}
> +
> +		unit = kzalloc(sizeof(struct thunder_uncore_unit), GFP_KERNEL);

Use:
	unit = kzalloc(sizeof(*unit), GFP_KERNEL)

> +	/*
> +	 * perf PMU is CPU dependent in difference to our uncore devices.
> +	 * Just pick a CPU and migrate away if it goes offline.
> +	 */
> +	cpumask_set_cpu(smp_processor_id(), &thunder_active_mask);

The current CPU is not guaranteed to be in the same node, no?

My comments earlier w.r.t. migration apply here too.

> +
> +	uncore->cpu_nb.notifier_call = thunder_uncore_pmu_cpu_notifier;
> +	uncore->cpu_nb.priority = CPU_PRI_PERF + 1;
> +	ret = register_cpu_notifier(&uncore->cpu_nb);
> +	if (ret)
> +		goto fail;
> +
> +	ret = perf_pmu_register(pmu, pmu->name, -1);
> +	if (ret)
> +		goto fail_pmu;
> +
> +	uncore->pmu = pmu;

Typically, the data related to the PMU is put in a struct which wraps
the struct pmu. That allows you to map either way using container_of.

Is there a particular reason for thunder_uncore to not contain the
struct PMU, rather than a pointer to it?

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 2/5] arm64/perf: Cavium ThunderX L2C TAD uncore support
  2016-03-09 16:21   ` Jan Glauber
@ 2016-04-19 15:43     ` Mark Rutland
  -1 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-19 15:43 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Will Deacon, linux-kernel, linux-arm-kernel

On Wed, Mar 09, 2016 at 05:21:04PM +0100, Jan Glauber wrote:
> Support counters of the L2 Cache tag and data units.
> 
> Also support pass2 added/modified counters by checking MIDR.
> 
> Signed-off-by: Jan Glauber <jglauber@cavium.com>
> ---
>  drivers/perf/uncore/Makefile                |   3 +-
>  drivers/perf/uncore/uncore_cavium.c         |   6 +-
>  drivers/perf/uncore/uncore_cavium.h         |   7 +-
>  drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
>  4 files changed, 613 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
> 
> diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
> index b9c72c2..6a16caf 100644
> --- a/drivers/perf/uncore/Makefile
> +++ b/drivers/perf/uncore/Makefile
> @@ -1 +1,2 @@
> -obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
> +obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
> +			      uncore_cavium_l2c_tad.o
> diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
> index 4fd5e45..b92b2ae 100644
> --- a/drivers/perf/uncore/uncore_cavium.c
> +++ b/drivers/perf/uncore/uncore_cavium.c
> @@ -15,7 +15,10 @@ int thunder_uncore_version;
>  
>  struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
>  {
> -	return NULL;
> +	if (event->pmu->type == thunder_l2c_tad_pmu.type)
> +		return thunder_uncore_l2c_tad;
> +	else
> +		return NULL;
>  }

If thunder_uncore contained the relevant struct pmu, you wouldn't need
this function.

You could take event->pmu, and use container_of to get the relevant
thunder_uncore.

So please do that and get rid of this function.

>  
>  void thunder_uncore_read(struct perf_event *event)
> @@ -296,6 +299,7 @@ static int __init thunder_uncore_init(void)
>  		thunder_uncore_version = 1;
>  	pr_info("PMU version: %d\n", thunder_uncore_version);
>  
> +	thunder_uncore_l2c_tad_setup();
>  	return 0;
>  }
>  late_initcall(thunder_uncore_init);
> diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
> index c799709..7a9c367 100644
> --- a/drivers/perf/uncore/uncore_cavium.h
> +++ b/drivers/perf/uncore/uncore_cavium.h
> @@ -7,7 +7,7 @@
>  #define pr_fmt(fmt)     "thunderx_uncore: " fmt
>  
>  enum uncore_type {
> -	NOP_TYPE,
> +	L2C_TAD_TYPE,
>  };
>  
>  extern int thunder_uncore_version;
> @@ -65,6 +65,9 @@ static inline struct thunder_uncore_node *get_node(u64 config,
>  extern struct attribute_group thunder_uncore_attr_group;
>  extern struct device_attribute format_attr_node;
>  
> +extern struct thunder_uncore *thunder_uncore_l2c_tad;
> +extern struct pmu thunder_l2c_tad_pmu;

The above hopefully means you can get rid of these.

>  /* Prototypes */
>  struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
>  void thunder_uncore_del(struct perf_event *event, int flags);
> @@ -76,3 +79,5 @@ int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
>  ssize_t thunder_events_sysfs_show(struct device *dev,
>  				  struct device_attribute *attr,
>  				  char *page);
> +
> +int thunder_uncore_l2c_tad_setup(void);
> diff --git a/drivers/perf/uncore/uncore_cavium_l2c_tad.c b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
> new file mode 100644
> index 0000000..c8dc305
> --- /dev/null
> +++ b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
> @@ -0,0 +1,600 @@
> +/*
> + * Cavium Thunder uncore PMU support, L2C TAD counters.

It would be good to put an explaination of the TAD unit here, even if
just expanding that to Tag And Data.

> + *
> + * Copyright 2016 Cavium Inc.
> + * Author: Jan Glauber <jan.glauber@cavium.com>
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/perf_event.h>

Minor nit, but as a general note I'd recommend alphabetically sorting
your includes now. 

That way any subsequent additions/removals are less likely to cause
painful conflicts (so long as they retain that order).

> +static void thunder_uncore_start(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore_unit *unit;
> +	u64 prev;
> +	int id;
> +
> +	node = get_node(hwc->config, uncore);
> +	id = get_id(hwc->config);
> +
> +	/* restore counter value divided by units into all counters */
> +	if (flags & PERF_EF_RELOAD) {
> +		prev = local64_read(&hwc->prev_count);
> +		prev = prev / node->nr_units;
> +
> +		list_for_each_entry(unit, &node->unit_list, entry)
> +			writeq(prev, hwc->event_base + unit->map);
> +	}

It would be vastly simpler to always restore zero into all counters, and
to update prev_count to account for this.

That will also save you any rounding loss from the division.

> +
> +	hwc->state = 0;
> +
> +	/* write byte in control registers for all units on the node */
> +	list_for_each_entry(unit, &node->unit_list, entry)
> +		writeb(id, hwc->config_base + unit->map);

That comment isn't very helpful. What is the intent and effect of this
write?

> +
> +	perf_event_update_userpage(event);
> +}
> +
> +static void thunder_uncore_stop(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore_unit *unit;
> +
> +	/* reset selection value for all units on the node */
> +	node = get_node(hwc->config, uncore);
> +
> +	list_for_each_entry(unit, &node->unit_list, entry)
> +		writeb(L2C_TAD_EVENTS_DISABLED, hwc->config_base + unit->map);
> +	hwc->state |= PERF_HES_STOPPED;
> +
> +	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> +		thunder_uncore_read(event);
> +		hwc->state |= PERF_HES_UPTODATE;
> +	}
> +}
> +
> +static int thunder_uncore_add(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	int i;
> +
> +	WARN_ON_ONCE(!uncore);

This is trivially never possible if uncore contains the pmu (or we
couldn't have initialised the event in the first place).

> +	node = get_node(hwc->config, uncore);
> +
> +	/* are we already assigned? */
> +	if (hwc->idx != -1 && node->events[hwc->idx] == event)
> +		goto out;

Why would the event already be assigned a particular counter?

Which other piece of code might do that?

As far as I can see, nothing else can.

> +
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (node->events[i] == event) {
> +			hwc->idx = i;
> +			goto out;
> +		}
> +	}

This should never happen, in the absence of a programming error. An
event should not be added multiple times, and adds and dels should be
balanced.

> +
> +	/* if not take the first available counter */
> +	hwc->idx = -1;
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (cmpxchg(&node->events[i], NULL, event) == NULL) {
> +			hwc->idx = i;
> +			break;
> +		}
> +	}
> +out:
> +	if (hwc->idx == -1)
> +		return -EBUSY;
> +
> +	hwc->config_base = hwc->idx;
> +	hwc->event_base = L2C_TAD_COUNTER_OFFSET +
> +			  hwc->idx * sizeof(unsigned long long);

What's going on here?

I see that we write use hwc->event_base as an offset into registers in
the HW, so a sizeof unsigned long long is unusual.

I'm guessing that you're figuring out the address of a 64 bit register.
A comment, and sizeof(u64) would be better.

> +EVENT_ATTR(l2t_hit,	L2C_TAD_EVENT_L2T_HIT);
> +EVENT_ATTR(l2t_miss,	L2C_TAD_EVENT_L2T_MISS);
> +EVENT_ATTR(l2t_noalloc,	L2C_TAD_EVENT_L2T_NOALLOC);
> +EVENT_ATTR(l2_vic,	L2C_TAD_EVENT_L2_VIC);
> +EVENT_ATTR(sc_fail,	L2C_TAD_EVENT_SC_FAIL);
> +EVENT_ATTR(sc_pass,	L2C_TAD_EVENT_SC_PASS);
> +EVENT_ATTR(lfb_occ,	L2C_TAD_EVENT_LFB_OCC);
> +EVENT_ATTR(wait_lfb,	L2C_TAD_EVENT_WAIT_LFB);
> +EVENT_ATTR(wait_vab,	L2C_TAD_EVENT_WAIT_VAB);
> +EVENT_ATTR(rtg_hit,	L2C_TAD_EVENT_RTG_HIT);
> +EVENT_ATTR(rtg_miss,	L2C_TAD_EVENT_RTG_MISS);
> +EVENT_ATTR(l2_rtg_vic,	L2C_TAD_EVENT_L2_RTG_VIC);
> +EVENT_ATTR(l2_open_oci,	L2C_TAD_EVENT_L2_OPEN_OCI);

> +static struct attribute *thunder_l2c_tad_events_attr[] = {
> +	EVENT_PTR(l2t_hit),
> +	EVENT_PTR(l2t_miss),
> +	EVENT_PTR(l2t_noalloc),
> +	EVENT_PTR(l2_vic),
> +	EVENT_PTR(sc_fail),
> +	EVENT_PTR(sc_pass),
> +	EVENT_PTR(lfb_occ),
> +	EVENT_PTR(wait_lfb),
> +	EVENT_PTR(wait_vab),
> +	EVENT_PTR(rtg_hit),
> +	EVENT_PTR(rtg_miss),
> +	EVENT_PTR(l2_rtg_vic),
> +	EVENT_PTR(l2_open_oci),

This duplication is tedious.

Please do something like we did for CCI in commit 5e442eba342e567e
("arm-cci: simplify sysfs attr handling") so you only need to define
each attribute once to create it and place it in the relevant attribute
pointer list.

Likewise for the other PMUs.

> +static struct attribute_group thunder_l2c_tad_events_group = {
> +	.name = "events",
> +	.attrs = NULL,
> +};
> +
> +static const struct attribute_group *thunder_l2c_tad_attr_groups[] = {
> +	&thunder_uncore_attr_group,
> +	&thunder_l2c_tad_format_group,
> +	&thunder_l2c_tad_events_group,
> +	NULL,
> +};
> +
> +struct pmu thunder_l2c_tad_pmu = {
> +	.attr_groups	= thunder_l2c_tad_attr_groups,
> +	.name		= "thunder_l2c_tad",
> +	.event_init	= thunder_uncore_event_init,
> +	.add		= thunder_uncore_add,
> +	.del		= thunder_uncore_del,
> +	.start		= thunder_uncore_start,
> +	.stop		= thunder_uncore_stop,
> +	.read		= thunder_uncore_read,
> +};
> +
> +static int event_valid(u64 config)

A bool would be clearer.

> +{
> +	if ((config > 0 && config <= L2C_TAD_EVENT_WAIT_VAB) ||
> +	    config == L2C_TAD_EVENT_RTG_HIT ||
> +	    config == L2C_TAD_EVENT_RTG_MISS ||
> +	    config == L2C_TAD_EVENT_L2_RTG_VIC ||
> +	    config == L2C_TAD_EVENT_L2_OPEN_OCI ||
> +	    ((config & 0x80) && ((config & 0xf) <= 3)))

What are these last cases?

> +		return 1;
> +
> +	if (thunder_uncore_version == 1)
> +		if (config == L2C_TAD_EVENT_OPEN_CCPI ||
> +		    (config >= L2C_TAD_EVENT_LOOKUP &&
> +		     config <= L2C_TAD_EVENT_LOOKUP_ALL) ||
> +		    (config >= L2C_TAD_EVENT_TAG_ALC_HIT &&
> +		     config <= L2C_TAD_EVENT_OCI_RTG_ALC_VIC &&
> +		     config != 0x4d &&
> +		     config != 0x66 &&
> +		     config != 0x67))

Likewise, what are these last cases?

Why not rule these out explicitly first?

> +			return 1;
> +
> +	return 0;
> +}
> +
> +int __init thunder_uncore_l2c_tad_setup(void)
> +{
> +	int ret = -ENOMEM;
> +
> +	thunder_uncore_l2c_tad = kzalloc(sizeof(struct thunder_uncore),
> +					 GFP_KERNEL);

As previously, sizeof(*ptr) is preferred to sizeof(type), though it
doesn't save you anything here.

> +	if (!thunder_uncore_l2c_tad)
> +		goto fail_nomem;
> +
> +	if (thunder_uncore_version == 0)
> +		thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_events_attr;
> +	else /* default */
> +		thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_pass2_events_attr;
> +
> +	ret = thunder_uncore_setup(thunder_uncore_l2c_tad,
> +			   PCI_DEVICE_ID_THUNDER_L2C_TAD,
> +			   L2C_TAD_CONTROL_OFFSET,
> +			   L2C_TAD_COUNTER_OFFSET + L2C_TAD_NR_COUNTERS
> +				* sizeof(unsigned long long),

It would be nicer to calculate the size earlier (with sizeof(u64) as
previously mentioned).

> +			   &thunder_l2c_tad_pmu,
> +			   L2C_TAD_NR_COUNTERS);
> +	if (ret)
> +		goto fail;
> +
> +	thunder_uncore_l2c_tad->type = L2C_TAD_TYPE;

I believe this can go, with thunder_uncore containing a pmu.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 2/5] arm64/perf: Cavium ThunderX L2C TAD uncore support
@ 2016-04-19 15:43     ` Mark Rutland
  0 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-19 15:43 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 09, 2016 at 05:21:04PM +0100, Jan Glauber wrote:
> Support counters of the L2 Cache tag and data units.
> 
> Also support pass2 added/modified counters by checking MIDR.
> 
> Signed-off-by: Jan Glauber <jglauber@cavium.com>
> ---
>  drivers/perf/uncore/Makefile                |   3 +-
>  drivers/perf/uncore/uncore_cavium.c         |   6 +-
>  drivers/perf/uncore/uncore_cavium.h         |   7 +-
>  drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600 ++++++++++++++++++++++++++++
>  4 files changed, 613 insertions(+), 3 deletions(-)
>  create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
> 
> diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
> index b9c72c2..6a16caf 100644
> --- a/drivers/perf/uncore/Makefile
> +++ b/drivers/perf/uncore/Makefile
> @@ -1 +1,2 @@
> -obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o
> +obj-$(CONFIG_ARCH_THUNDER) += uncore_cavium.o		\
> +			      uncore_cavium_l2c_tad.o
> diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
> index 4fd5e45..b92b2ae 100644
> --- a/drivers/perf/uncore/uncore_cavium.c
> +++ b/drivers/perf/uncore/uncore_cavium.c
> @@ -15,7 +15,10 @@ int thunder_uncore_version;
>  
>  struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event)
>  {
> -	return NULL;
> +	if (event->pmu->type == thunder_l2c_tad_pmu.type)
> +		return thunder_uncore_l2c_tad;
> +	else
> +		return NULL;
>  }

If thunder_uncore contained the relevant struct pmu, you wouldn't need
this function.

You could take event->pmu, and use container_of to get the relevant
thunder_uncore.

So please do that and get rid of this function.

>  
>  void thunder_uncore_read(struct perf_event *event)
> @@ -296,6 +299,7 @@ static int __init thunder_uncore_init(void)
>  		thunder_uncore_version = 1;
>  	pr_info("PMU version: %d\n", thunder_uncore_version);
>  
> +	thunder_uncore_l2c_tad_setup();
>  	return 0;
>  }
>  late_initcall(thunder_uncore_init);
> diff --git a/drivers/perf/uncore/uncore_cavium.h b/drivers/perf/uncore/uncore_cavium.h
> index c799709..7a9c367 100644
> --- a/drivers/perf/uncore/uncore_cavium.h
> +++ b/drivers/perf/uncore/uncore_cavium.h
> @@ -7,7 +7,7 @@
>  #define pr_fmt(fmt)     "thunderx_uncore: " fmt
>  
>  enum uncore_type {
> -	NOP_TYPE,
> +	L2C_TAD_TYPE,
>  };
>  
>  extern int thunder_uncore_version;
> @@ -65,6 +65,9 @@ static inline struct thunder_uncore_node *get_node(u64 config,
>  extern struct attribute_group thunder_uncore_attr_group;
>  extern struct device_attribute format_attr_node;
>  
> +extern struct thunder_uncore *thunder_uncore_l2c_tad;
> +extern struct pmu thunder_l2c_tad_pmu;

The above hopefully means you can get rid of these.

>  /* Prototypes */
>  struct thunder_uncore *event_to_thunder_uncore(struct perf_event *event);
>  void thunder_uncore_del(struct perf_event *event, int flags);
> @@ -76,3 +79,5 @@ int thunder_uncore_setup(struct thunder_uncore *uncore, int id,
>  ssize_t thunder_events_sysfs_show(struct device *dev,
>  				  struct device_attribute *attr,
>  				  char *page);
> +
> +int thunder_uncore_l2c_tad_setup(void);
> diff --git a/drivers/perf/uncore/uncore_cavium_l2c_tad.c b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
> new file mode 100644
> index 0000000..c8dc305
> --- /dev/null
> +++ b/drivers/perf/uncore/uncore_cavium_l2c_tad.c
> @@ -0,0 +1,600 @@
> +/*
> + * Cavium Thunder uncore PMU support, L2C TAD counters.

It would be good to put an explaination of the TAD unit here, even if
just expanding that to Tag And Data.

> + *
> + * Copyright 2016 Cavium Inc.
> + * Author: Jan Glauber <jan.glauber@cavium.com>
> + */
> +
> +#include <linux/slab.h>
> +#include <linux/perf_event.h>

Minor nit, but as a general note I'd recommend alphabetically sorting
your includes now. 

That way any subsequent additions/removals are less likely to cause
painful conflicts (so long as they retain that order).

> +static void thunder_uncore_start(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore_unit *unit;
> +	u64 prev;
> +	int id;
> +
> +	node = get_node(hwc->config, uncore);
> +	id = get_id(hwc->config);
> +
> +	/* restore counter value divided by units into all counters */
> +	if (flags & PERF_EF_RELOAD) {
> +		prev = local64_read(&hwc->prev_count);
> +		prev = prev / node->nr_units;
> +
> +		list_for_each_entry(unit, &node->unit_list, entry)
> +			writeq(prev, hwc->event_base + unit->map);
> +	}

It would be vastly simpler to always restore zero into all counters, and
to update prev_count to account for this.

That will also save you any rounding loss from the division.

> +
> +	hwc->state = 0;
> +
> +	/* write byte in control registers for all units on the node */
> +	list_for_each_entry(unit, &node->unit_list, entry)
> +		writeb(id, hwc->config_base + unit->map);

That comment isn't very helpful. What is the intent and effect of this
write?

> +
> +	perf_event_update_userpage(event);
> +}
> +
> +static void thunder_uncore_stop(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore_unit *unit;
> +
> +	/* reset selection value for all units on the node */
> +	node = get_node(hwc->config, uncore);
> +
> +	list_for_each_entry(unit, &node->unit_list, entry)
> +		writeb(L2C_TAD_EVENTS_DISABLED, hwc->config_base + unit->map);
> +	hwc->state |= PERF_HES_STOPPED;
> +
> +	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> +		thunder_uncore_read(event);
> +		hwc->state |= PERF_HES_UPTODATE;
> +	}
> +}
> +
> +static int thunder_uncore_add(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	int i;
> +
> +	WARN_ON_ONCE(!uncore);

This is trivially never possible if uncore contains the pmu (or we
couldn't have initialised the event in the first place).

> +	node = get_node(hwc->config, uncore);
> +
> +	/* are we already assigned? */
> +	if (hwc->idx != -1 && node->events[hwc->idx] == event)
> +		goto out;

Why would the event already be assigned a particular counter?

Which other piece of code might do that?

As far as I can see, nothing else can.

> +
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (node->events[i] == event) {
> +			hwc->idx = i;
> +			goto out;
> +		}
> +	}

This should never happen, in the absence of a programming error. An
event should not be added multiple times, and adds and dels should be
balanced.

> +
> +	/* if not take the first available counter */
> +	hwc->idx = -1;
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (cmpxchg(&node->events[i], NULL, event) == NULL) {
> +			hwc->idx = i;
> +			break;
> +		}
> +	}
> +out:
> +	if (hwc->idx == -1)
> +		return -EBUSY;
> +
> +	hwc->config_base = hwc->idx;
> +	hwc->event_base = L2C_TAD_COUNTER_OFFSET +
> +			  hwc->idx * sizeof(unsigned long long);

What's going on here?

I see that we write use hwc->event_base as an offset into registers in
the HW, so a sizeof unsigned long long is unusual.

I'm guessing that you're figuring out the address of a 64 bit register.
A comment, and sizeof(u64) would be better.

> +EVENT_ATTR(l2t_hit,	L2C_TAD_EVENT_L2T_HIT);
> +EVENT_ATTR(l2t_miss,	L2C_TAD_EVENT_L2T_MISS);
> +EVENT_ATTR(l2t_noalloc,	L2C_TAD_EVENT_L2T_NOALLOC);
> +EVENT_ATTR(l2_vic,	L2C_TAD_EVENT_L2_VIC);
> +EVENT_ATTR(sc_fail,	L2C_TAD_EVENT_SC_FAIL);
> +EVENT_ATTR(sc_pass,	L2C_TAD_EVENT_SC_PASS);
> +EVENT_ATTR(lfb_occ,	L2C_TAD_EVENT_LFB_OCC);
> +EVENT_ATTR(wait_lfb,	L2C_TAD_EVENT_WAIT_LFB);
> +EVENT_ATTR(wait_vab,	L2C_TAD_EVENT_WAIT_VAB);
> +EVENT_ATTR(rtg_hit,	L2C_TAD_EVENT_RTG_HIT);
> +EVENT_ATTR(rtg_miss,	L2C_TAD_EVENT_RTG_MISS);
> +EVENT_ATTR(l2_rtg_vic,	L2C_TAD_EVENT_L2_RTG_VIC);
> +EVENT_ATTR(l2_open_oci,	L2C_TAD_EVENT_L2_OPEN_OCI);

> +static struct attribute *thunder_l2c_tad_events_attr[] = {
> +	EVENT_PTR(l2t_hit),
> +	EVENT_PTR(l2t_miss),
> +	EVENT_PTR(l2t_noalloc),
> +	EVENT_PTR(l2_vic),
> +	EVENT_PTR(sc_fail),
> +	EVENT_PTR(sc_pass),
> +	EVENT_PTR(lfb_occ),
> +	EVENT_PTR(wait_lfb),
> +	EVENT_PTR(wait_vab),
> +	EVENT_PTR(rtg_hit),
> +	EVENT_PTR(rtg_miss),
> +	EVENT_PTR(l2_rtg_vic),
> +	EVENT_PTR(l2_open_oci),

This duplication is tedious.

Please do something like we did for CCI in commit 5e442eba342e567e
("arm-cci: simplify sysfs attr handling") so you only need to define
each attribute once to create it and place it in the relevant attribute
pointer list.

Likewise for the other PMUs.

> +static struct attribute_group thunder_l2c_tad_events_group = {
> +	.name = "events",
> +	.attrs = NULL,
> +};
> +
> +static const struct attribute_group *thunder_l2c_tad_attr_groups[] = {
> +	&thunder_uncore_attr_group,
> +	&thunder_l2c_tad_format_group,
> +	&thunder_l2c_tad_events_group,
> +	NULL,
> +};
> +
> +struct pmu thunder_l2c_tad_pmu = {
> +	.attr_groups	= thunder_l2c_tad_attr_groups,
> +	.name		= "thunder_l2c_tad",
> +	.event_init	= thunder_uncore_event_init,
> +	.add		= thunder_uncore_add,
> +	.del		= thunder_uncore_del,
> +	.start		= thunder_uncore_start,
> +	.stop		= thunder_uncore_stop,
> +	.read		= thunder_uncore_read,
> +};
> +
> +static int event_valid(u64 config)

A bool would be clearer.

> +{
> +	if ((config > 0 && config <= L2C_TAD_EVENT_WAIT_VAB) ||
> +	    config == L2C_TAD_EVENT_RTG_HIT ||
> +	    config == L2C_TAD_EVENT_RTG_MISS ||
> +	    config == L2C_TAD_EVENT_L2_RTG_VIC ||
> +	    config == L2C_TAD_EVENT_L2_OPEN_OCI ||
> +	    ((config & 0x80) && ((config & 0xf) <= 3)))

What are these last cases?

> +		return 1;
> +
> +	if (thunder_uncore_version == 1)
> +		if (config == L2C_TAD_EVENT_OPEN_CCPI ||
> +		    (config >= L2C_TAD_EVENT_LOOKUP &&
> +		     config <= L2C_TAD_EVENT_LOOKUP_ALL) ||
> +		    (config >= L2C_TAD_EVENT_TAG_ALC_HIT &&
> +		     config <= L2C_TAD_EVENT_OCI_RTG_ALC_VIC &&
> +		     config != 0x4d &&
> +		     config != 0x66 &&
> +		     config != 0x67))

Likewise, what are these last cases?

Why not rule these out explicitly first?

> +			return 1;
> +
> +	return 0;
> +}
> +
> +int __init thunder_uncore_l2c_tad_setup(void)
> +{
> +	int ret = -ENOMEM;
> +
> +	thunder_uncore_l2c_tad = kzalloc(sizeof(struct thunder_uncore),
> +					 GFP_KERNEL);

As previously, sizeof(*ptr) is preferred to sizeof(type), though it
doesn't save you anything here.

> +	if (!thunder_uncore_l2c_tad)
> +		goto fail_nomem;
> +
> +	if (thunder_uncore_version == 0)
> +		thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_events_attr;
> +	else /* default */
> +		thunder_l2c_tad_events_group.attrs = thunder_l2c_tad_pass2_events_attr;
> +
> +	ret = thunder_uncore_setup(thunder_uncore_l2c_tad,
> +			   PCI_DEVICE_ID_THUNDER_L2C_TAD,
> +			   L2C_TAD_CONTROL_OFFSET,
> +			   L2C_TAD_COUNTER_OFFSET + L2C_TAD_NR_COUNTERS
> +				* sizeof(unsigned long long),

It would be nicer to calculate the size earlier (with sizeof(u64) as
previously mentioned).

> +			   &thunder_l2c_tad_pmu,
> +			   L2C_TAD_NR_COUNTERS);
> +	if (ret)
> +		goto fail;
> +
> +	thunder_uncore_l2c_tad->type = L2C_TAD_TYPE;

I believe this can go, with thunder_uncore containing a pmu.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 3/5] arm64/perf: Cavium ThunderX L2C CBC uncore support
  2016-03-09 16:21   ` Jan Glauber
@ 2016-04-19 15:56     ` Mark Rutland
  -1 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-19 15:56 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Will Deacon, linux-kernel, linux-arm-kernel

On Wed, Mar 09, 2016 at 05:21:05PM +0100, Jan Glauber wrote:
> @@ -300,6 +302,7 @@ static int __init thunder_uncore_init(void)
>  	pr_info("PMU version: %d\n", thunder_uncore_version);
>  
>  	thunder_uncore_l2c_tad_setup();
> +	thunder_uncore_l2c_cbc_setup();
>  	return 0;
>  }
>  late_initcall(thunder_uncore_init);

Why aren't these just probed independently, as separate PCI devices,
rather than using a shared initcall?

You'd have to read the MIDR a few times, but that's a tiny fraction of
the rest of the cost of probing, and you can keep the common portion as
a stateless library.

> +int l2c_cbc_events[L2C_CBC_NR_COUNTERS] = {
> +	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06,
> +	0x08, 0x09, 0x0a, 0x0b, 0x0c,
> +	0x10, 0x11, 0x12, 0x13
> +};

What are these magic numbers?

A comment would be helpful here.

> +
> +static void thunder_uncore_start(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore_unit *unit;
> +	u64 prev;
> +
> +	node = get_node(hwc->config, uncore);
> +
> +	/* restore counter value divided by units into all counters */
> +	if (flags & PERF_EF_RELOAD) {
> +		prev = local64_read(&hwc->prev_count);
> +		prev = prev / node->nr_units;
> +
> +		list_for_each_entry(unit, &node->unit_list, entry)
> +			writeq(prev, hwc->event_base + unit->map);
> +	}
> +
> +	hwc->state = 0;
> +	perf_event_update_userpage(event);
> +}

This looks practically identical to the code in patch 2. Please factor
the common portion into the library code from patch 1 (zeroing the
registers), and share it.

> +
> +static void thunder_uncore_stop(struct perf_event *event, int flags)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> +		thunder_uncore_read(event);
> +		hwc->state |= PERF_HES_UPTODATE;
> +	}
> +}

There's no stop control for this PMU?

I was under the impression the core perf code could read the counter
while it was stopped, and it would unexpectedly count increasing values.

Does PERF_HES_UPTODATE stop the core from reading the counter, or is
it the responsibility of the backend to check that? I see that
thunder_uncore_read does not.

Do you need PERF_HES_STOPPED, or does that not matter due to the lack of
interrupts?

> +
> +static int thunder_uncore_add(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	int id, i;
> +
> +	WARN_ON_ONCE(!uncore);
> +	node = get_node(hwc->config, uncore);
> +	id = get_id(hwc->config);
> +
> +	/* are we already assigned? */
> +	if (hwc->idx != -1 && node->events[hwc->idx] == event)
> +		goto out;
> +
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (node->events[i] == event) {
> +			hwc->idx = i;
> +			goto out;
> +		}
> +	}
> +
> +	/* these counters are self-sustained so idx must match the counter! */
> +	hwc->idx = -1;
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (l2c_cbc_events[i] == id) {
> +			if (cmpxchg(&node->events[i], NULL, event) == NULL) {
> +				hwc->idx = i;
> +				break;
> +			}
> +		}
> +	}
> +
> +out:
> +	if (hwc->idx == -1)
> +		return -EBUSY;
> +
> +	hwc->event_base = id * sizeof(unsigned long long);
> +
> +	/* counter is not stoppable so avoiding PERF_HES_STOPPED */
> +	hwc->state = PERF_HES_UPTODATE;
> +
> +	if (flags & PERF_EF_START)
> +		thunder_uncore_start(event, 0);
> +
> +	return 0;
> +}

This looks practically identical to code from path 2, and all my
comments there apply.

Please factor this out into the library code in patch 1, taking into
account my comments on patch 2.

Likewise, the remainder of the file is mostly a copy+paste of patch 2.
All those comments apply equally to this patch.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 3/5] arm64/perf: Cavium ThunderX L2C CBC uncore support
@ 2016-04-19 15:56     ` Mark Rutland
  0 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-19 15:56 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Mar 09, 2016 at 05:21:05PM +0100, Jan Glauber wrote:
> @@ -300,6 +302,7 @@ static int __init thunder_uncore_init(void)
>  	pr_info("PMU version: %d\n", thunder_uncore_version);
>  
>  	thunder_uncore_l2c_tad_setup();
> +	thunder_uncore_l2c_cbc_setup();
>  	return 0;
>  }
>  late_initcall(thunder_uncore_init);

Why aren't these just probed independently, as separate PCI devices,
rather than using a shared initcall?

You'd have to read the MIDR a few times, but that's a tiny fraction of
the rest of the cost of probing, and you can keep the common portion as
a stateless library.

> +int l2c_cbc_events[L2C_CBC_NR_COUNTERS] = {
> +	0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06,
> +	0x08, 0x09, 0x0a, 0x0b, 0x0c,
> +	0x10, 0x11, 0x12, 0x13
> +};

What are these magic numbers?

A comment would be helpful here.

> +
> +static void thunder_uncore_start(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	struct thunder_uncore_unit *unit;
> +	u64 prev;
> +
> +	node = get_node(hwc->config, uncore);
> +
> +	/* restore counter value divided by units into all counters */
> +	if (flags & PERF_EF_RELOAD) {
> +		prev = local64_read(&hwc->prev_count);
> +		prev = prev / node->nr_units;
> +
> +		list_for_each_entry(unit, &node->unit_list, entry)
> +			writeq(prev, hwc->event_base + unit->map);
> +	}
> +
> +	hwc->state = 0;
> +	perf_event_update_userpage(event);
> +}

This looks practically identical to the code in patch 2. Please factor
the common portion into the library code from patch 1 (zeroing the
registers), and share it.

> +
> +static void thunder_uncore_stop(struct perf_event *event, int flags)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> +		thunder_uncore_read(event);
> +		hwc->state |= PERF_HES_UPTODATE;
> +	}
> +}

There's no stop control for this PMU?

I was under the impression the core perf code could read the counter
while it was stopped, and it would unexpectedly count increasing values.

Does PERF_HES_UPTODATE stop the core from reading the counter, or is
it the responsibility of the backend to check that? I see that
thunder_uncore_read does not.

Do you need PERF_HES_STOPPED, or does that not matter due to the lack of
interrupts?

> +
> +static int thunder_uncore_add(struct perf_event *event, int flags)
> +{
> +	struct thunder_uncore *uncore = event_to_thunder_uncore(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct thunder_uncore_node *node;
> +	int id, i;
> +
> +	WARN_ON_ONCE(!uncore);
> +	node = get_node(hwc->config, uncore);
> +	id = get_id(hwc->config);
> +
> +	/* are we already assigned? */
> +	if (hwc->idx != -1 && node->events[hwc->idx] == event)
> +		goto out;
> +
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (node->events[i] == event) {
> +			hwc->idx = i;
> +			goto out;
> +		}
> +	}
> +
> +	/* these counters are self-sustained so idx must match the counter! */
> +	hwc->idx = -1;
> +	for (i = 0; i < node->num_counters; i++) {
> +		if (l2c_cbc_events[i] == id) {
> +			if (cmpxchg(&node->events[i], NULL, event) == NULL) {
> +				hwc->idx = i;
> +				break;
> +			}
> +		}
> +	}
> +
> +out:
> +	if (hwc->idx == -1)
> +		return -EBUSY;
> +
> +	hwc->event_base = id * sizeof(unsigned long long);
> +
> +	/* counter is not stoppable so avoiding PERF_HES_STOPPED */
> +	hwc->state = PERF_HES_UPTODATE;
> +
> +	if (flags & PERF_EF_START)
> +		thunder_uncore_start(event, 0);
> +
> +	return 0;
> +}

This looks practically identical to code from path 2, and all my
comments there apply.

Please factor this out into the library code in patch 1, taking into
account my comments on patch 2.

Likewise, the remainder of the file is mostly a copy+paste of patch 2.
All those comments apply equally to this patch.

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-04-19 10:35     ` Jan Glauber
@ 2016-04-19 16:03       ` Mark Rutland
  -1 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-19 16:03 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Will Deacon, linux-kernel, linux-arm-kernel

On Tue, Apr 19, 2016 at 12:35:10PM +0200, Jan Glauber wrote:
> Mark,
> 
> are these patches still queued or should I repost them?

Apologies for the delay. I've just given these a review.

I note an awful lot of duplication over patches 2-5. The pmu::add
implementations are practically identical, and I suspect can be shared
without much difficulty. Please try to address that unnecessary
duplication.

My comments on patches 2 and 3 larely apply to 4 and 5, so I haven't
reviewed those individually.

Thanks,
Mark.


> 
> --Jan
> 
> On Mon, Apr 04, 2016 at 01:03:13PM +0200, Jan Glauber wrote:
> > Hi Mark,
> > 
> > can you have a look at these patches?
> > 
> > Thanks,
> > Jan
> > 
> > 2016-03-09 17:21 GMT+01:00 Jan Glauber <jglauber@cavium.com>:
> > 
> >     This patch series provides access to various counters on the ThunderX SOC.
> > 
> >     For details of the uncore implementation see patch #1.
> > 
> >     Patches #2-5 add the various ThunderX specific PMUs.
> > 
> >     As suggested I've put the files under drivers/perf/uncore. I would
> >     prefer this location over drivers/bus because not all of the uncore
> >     drivers are bus related.
> > 
> >     Changes to v1:
> >     - Added NUMA support
> >     - Fixed CPU hotplug by pmu migration
> >     - Moved files to drivers/perf/uncore
> >     - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
> >     - improved comments abount overflow interrupts
> >     - removed max device limit
> >     - trimmed include files
> > 
> >     Feedback welcome!
> >     Jan
> > 
> >     -------------------------------------------------
> > 
> >     Jan Glauber (5):
> >       arm64/perf: Basic uncore counter support for Cavium ThunderX
> >       arm64/perf: Cavium ThunderX L2C TAD uncore support
> >       arm64/perf: Cavium ThunderX L2C CBC uncore support
> >       arm64/perf: Cavium ThunderX LMC uncore support
> >       arm64/perf: Cavium ThunderX OCX TLK uncore support
> > 
> >      drivers/perf/Makefile                       |   1 +
> >      drivers/perf/uncore/Makefile                |   5 +
> >      drivers/perf/uncore/uncore_cavium.c         | 314 +++++++++++++++
> >      drivers/perf/uncore/uncore_cavium.h         |  95 +++++
> >      drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
> >      drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600
> >     ++++++++++++++++++++++++++++
> >      drivers/perf/uncore/uncore_cavium_lmc.c     | 196 +++++++++
> >      drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
> >      8 files changed, 1828 insertions(+)
> >      create mode 100644 drivers/perf/uncore/Makefile
> >      create mode 100644 drivers/perf/uncore/uncore_cavium.c
> >      create mode 100644 drivers/perf/uncore/uncore_cavium.h
> >      create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
> >      create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
> >      create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
> >      create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
> >    
> >     --
> >     1.9.1
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-19 16:03       ` Mark Rutland
  0 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-19 16:03 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 19, 2016 at 12:35:10PM +0200, Jan Glauber wrote:
> Mark,
> 
> are these patches still queued or should I repost them?

Apologies for the delay. I've just given these a review.

I note an awful lot of duplication over patches 2-5. The pmu::add
implementations are practically identical, and I suspect can be shared
without much difficulty. Please try to address that unnecessary
duplication.

My comments on patches 2 and 3 larely apply to 4 and 5, so I haven't
reviewed those individually.

Thanks,
Mark.


> 
> --Jan
> 
> On Mon, Apr 04, 2016 at 01:03:13PM +0200, Jan Glauber wrote:
> > Hi Mark,
> > 
> > can you have a look at these patches?
> > 
> > Thanks,
> > Jan
> > 
> > 2016-03-09 17:21 GMT+01:00 Jan Glauber <jglauber@cavium.com>:
> > 
> >     This patch series provides access to various counters on the ThunderX SOC.
> > 
> >     For details of the uncore implementation see patch #1.
> > 
> >     Patches #2-5 add the various ThunderX specific PMUs.
> > 
> >     As suggested I've put the files under drivers/perf/uncore. I would
> >     prefer this location over drivers/bus because not all of the uncore
> >     drivers are bus related.
> > 
> >     Changes to v1:
> >     - Added NUMA support
> >     - Fixed CPU hotplug by pmu migration
> >     - Moved files to drivers/perf/uncore
> >     - Removed OCX FRC and LNE drivers, these will fit better into a edac driver
> >     - improved comments abount overflow interrupts
> >     - removed max device limit
> >     - trimmed include files
> > 
> >     Feedback welcome!
> >     Jan
> > 
> >     -------------------------------------------------
> > 
> >     Jan Glauber (5):
> >     ? arm64/perf: Basic uncore counter support for Cavium ThunderX
> >     ? arm64/perf: Cavium ThunderX L2C TAD uncore support
> >     ? arm64/perf: Cavium ThunderX L2C CBC uncore support
> >     ? arm64/perf: Cavium ThunderX LMC uncore support
> >     ? arm64/perf: Cavium ThunderX OCX TLK uncore support
> > 
> >     ?drivers/perf/Makefile? ? ? ? ? ? ? ? ? ? ? ?|? ?1 +
> >     ?drivers/perf/uncore/Makefile? ? ? ? ? ? ? ? |? ?5 +
> >     ?drivers/perf/uncore/uncore_cavium.c? ? ? ? ?| 314 +++++++++++++++
> >     ?drivers/perf/uncore/uncore_cavium.h? ? ? ? ?|? 95 +++++
> >     ?drivers/perf/uncore/uncore_cavium_l2c_cbc.c | 237 +++++++++++
> >     ?drivers/perf/uncore/uncore_cavium_l2c_tad.c | 600
> >     ++++++++++++++++++++++++++++
> >     ?drivers/perf/uncore/uncore_cavium_lmc.c? ? ?| 196 +++++++++
> >     ?drivers/perf/uncore/uncore_cavium_ocx_tlk.c | 380 ++++++++++++++++++
> >     ?8 files changed, 1828 insertions(+)
> >     ?create mode 100644 drivers/perf/uncore/Makefile
> >     ?create mode 100644 drivers/perf/uncore/uncore_cavium.c
> >     ?create mode 100644 drivers/perf/uncore/uncore_cavium.h
> >     ?create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_cbc.c
> >     ?create mode 100644 drivers/perf/uncore/uncore_cavium_l2c_tad.c
> >     ?create mode 100644 drivers/perf/uncore/uncore_cavium_lmc.c
> >     ?create mode 100644 drivers/perf/uncore/uncore_cavium_ocx_tlk.c
> >    
> >     --
> >     1.9.1
> > 
> > 
> > 
> 

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 1/5] arm64/perf: Basic uncore counter support for Cavium ThunderX
  2016-04-19 15:06     ` Mark Rutland
@ 2016-04-20 12:29       ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-20 12:29 UTC (permalink / raw)
  To: Mark Rutland; +Cc: Will Deacon, linux-kernel, linux-arm-kernel

On Tue, Apr 19, 2016 at 04:06:08PM +0100, Mark Rutland wrote:
> On Wed, Mar 09, 2016 at 05:21:03PM +0100, Jan Glauber wrote:
> > Provide "uncore" facilities for different non-CPU performance
> > counter units. Based on Intel/AMD uncore pmu support.
> > 
> > The uncore drivers cover quite different functionality including
> > L2 Cache, memory controllers and interconnects.
> > 
> > The uncore PMUs can be found under /sys/bus/event_source/devices.
> > All counters are exported via sysfs in the corresponding events
> > files under the PMU directory so the perf tool can list the event names.
> > 
> > There are some points that are special in this implementation:
> > 
> > 1) The PMU detection relies on PCI device detection. If a
> >    matching PCI device is found the PMU is created. The code can deal
> >    with multiple units of the same type, e.g. more than one memory
> >    controller.
> >    Note: There is also a CPUID check to determine the CPU variant,
> >    this is needed to support different hardware versions that use
> >    the same PCI IDs.
> > 
> > 2) Counters are summarized across different units of the same type
> >    on one NUMA node.
> >    For instance L2C TAD 0..7 are presented as a single counter
> >    (adding the values from TAD 0 to 7). Although losing the ability
> >    to read a single value the merged values are easier to use.
> 
> Merging within a NUMA node, but no further seems a little arbitrary.
> 
> > 3) NUMA support. The device node id is used to group devices by node
> >    so counters on one node can be merged. The NUMA node can be selected
> >    via a new sysfs node attribute.
> >    Without NUMA support all devices will be on node 0.
> 
> It doesn't seem great that this depends on kernel configuration (which
> is independent of HW configuration). It seems confusing for the user,
> and fragile.
> 
> Do we not have access to another way of grouping cores (e.g. a socket
> ID), that's independent of kernel configuration? That seems to be how
> the x86 uncore PMUs are handled.

I'm not sure how relevant the use case of a multi-node system without
CONFIG_NUMA is, but maybe we can get the socket ID from the
multiprocessor affinity register (MPIDR_EL1)? The AFF2 part (bits 23:16)
should contain the socket number on ThunderX.

Would that be better?

thanks,
Jan

> If we don't have that information, it really feels like we need
> additional info from FW (which would also solve the CPUID issue with
> point 1), or this is likely to be very fragile.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 1/5] arm64/perf: Basic uncore counter support for Cavium ThunderX
@ 2016-04-20 12:29       ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-20 12:29 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 19, 2016 at 04:06:08PM +0100, Mark Rutland wrote:
> On Wed, Mar 09, 2016 at 05:21:03PM +0100, Jan Glauber wrote:
> > Provide "uncore" facilities for different non-CPU performance
> > counter units. Based on Intel/AMD uncore pmu support.
> > 
> > The uncore drivers cover quite different functionality including
> > L2 Cache, memory controllers and interconnects.
> > 
> > The uncore PMUs can be found under /sys/bus/event_source/devices.
> > All counters are exported via sysfs in the corresponding events
> > files under the PMU directory so the perf tool can list the event names.
> > 
> > There are some points that are special in this implementation:
> > 
> > 1) The PMU detection relies on PCI device detection. If a
> >    matching PCI device is found the PMU is created. The code can deal
> >    with multiple units of the same type, e.g. more than one memory
> >    controller.
> >    Note: There is also a CPUID check to determine the CPU variant,
> >    this is needed to support different hardware versions that use
> >    the same PCI IDs.
> > 
> > 2) Counters are summarized across different units of the same type
> >    on one NUMA node.
> >    For instance L2C TAD 0..7 are presented as a single counter
> >    (adding the values from TAD 0 to 7). Although losing the ability
> >    to read a single value the merged values are easier to use.
> 
> Merging within a NUMA node, but no further seems a little arbitrary.
> 
> > 3) NUMA support. The device node id is used to group devices by node
> >    so counters on one node can be merged. The NUMA node can be selected
> >    via a new sysfs node attribute.
> >    Without NUMA support all devices will be on node 0.
> 
> It doesn't seem great that this depends on kernel configuration (which
> is independent of HW configuration). It seems confusing for the user,
> and fragile.
> 
> Do we not have access to another way of grouping cores (e.g. a socket
> ID), that's independent of kernel configuration? That seems to be how
> the x86 uncore PMUs are handled.

I'm not sure how relevant the use case of a multi-node system without
CONFIG_NUMA is, but maybe we can get the socket ID from the
multiprocessor affinity register (MPIDR_EL1)? The AFF2 part (bits 23:16)
should contain the socket number on ThunderX.

Would that be better?

thanks,
Jan

> If we don't have that information, it really feels like we need
> additional info from FW (which would also solve the CPUID issue with
> point 1), or this is likely to be very fragile.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-04-04 12:19   ` Jan Glauber
@ 2016-04-25 11:22     ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-04-25 11:22 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

Hi Jan,

On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> Hi Mark,
> 
> can you have a look at these patches?

Looks like Mark reviewed this last week -- are you planning to respin?

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-25 11:22     ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-04-25 11:22 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jan,

On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> Hi Mark,
> 
> can you have a look at these patches?

Looks like Mark reviewed this last week -- are you planning to respin?

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-04-25 11:22     ` Will Deacon
@ 2016-04-25 12:02       ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-25 12:02 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

Hi Will,

On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> Hi Jan,
> 
> On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > Hi Mark,
> > 
> > can you have a look at these patches?
> 
> Looks like Mark reviewed this last week -- are you planning to respin?
> 
> Will

Yes, of course. I just had no time yet and I'm a bit lost on how to
proceed without using the NUMA node information which Mark did not like
to be used.

The only way to know which device is on which node would be to look
at the PCI topology (which is also the source of the NUMA node_id).
We could do this manually in order to not depend on CONFIG_NUMA,
but I would like to know if that is acceptable before respinning the
patches.

Thanks!
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-25 12:02       ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-25 12:02 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Will,

On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> Hi Jan,
> 
> On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > Hi Mark,
> > 
> > can you have a look at these patches?
> 
> Looks like Mark reviewed this last week -- are you planning to respin?
> 
> Will

Yes, of course. I just had no time yet and I'm a bit lost on how to
proceed without using the NUMA node information which Mark did not like
to be used.

The only way to know which device is on which node would be to look
at the PCI topology (which is also the source of the NUMA node_id).
We could do this manually in order to not depend on CONFIG_NUMA,
but I would like to know if that is acceptable before respinning the
patches.

Thanks!
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-04-25 12:02       ` Jan Glauber
@ 2016-04-25 13:19         ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-04-25 13:19 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > can you have a look at these patches?
> > 
> > Looks like Mark reviewed this last week -- are you planning to respin?
> 
> Yes, of course. I just had no time yet and I'm a bit lost on how to
> proceed without using the NUMA node information which Mark did not like
> to be used.
> 
> The only way to know which device is on which node would be to look
> at the PCI topology (which is also the source of the NUMA node_id).
> We could do this manually in order to not depend on CONFIG_NUMA,
> but I would like to know if that is acceptable before respinning the
> patches.

That doesn't feel like it really addresses Mark's concerns -- it's just
another way to get the information that isn't a first-class PMU topology
description from firmware.

Now, I don't actually mind using the NUMA topology so much in the cases
where it genuinely correlates with the PMU topology. My objection is more
that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
result in working with an incorrect PMU topology and passing all of that
through to userspace.

So I'd prefer either making the driver depend on NUMA, or at the very least
failing to probe  the PMU if we discover a socketed system and NUMA is not
selected. Do either of those work as a compromise?

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-25 13:19         ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-04-25 13:19 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > can you have a look at these patches?
> > 
> > Looks like Mark reviewed this last week -- are you planning to respin?
> 
> Yes, of course. I just had no time yet and I'm a bit lost on how to
> proceed without using the NUMA node information which Mark did not like
> to be used.
> 
> The only way to know which device is on which node would be to look
> at the PCI topology (which is also the source of the NUMA node_id).
> We could do this manually in order to not depend on CONFIG_NUMA,
> but I would like to know if that is acceptable before respinning the
> patches.

That doesn't feel like it really addresses Mark's concerns -- it's just
another way to get the information that isn't a first-class PMU topology
description from firmware.

Now, I don't actually mind using the NUMA topology so much in the cases
where it genuinely correlates with the PMU topology. My objection is more
that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
result in working with an incorrect PMU topology and passing all of that
through to userspace.

So I'd prefer either making the driver depend on NUMA, or at the very least
failing to probe  the PMU if we discover a socketed system and NUMA is not
selected. Do either of those work as a compromise?

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-04-25 13:19         ` Will Deacon
@ 2016-04-26 12:08           ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-26 12:08 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel, David Daney

On Mon, Apr 25, 2016 at 02:19:07PM +0100, Will Deacon wrote:
> On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> > On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > > can you have a look at these patches?
> > > 
> > > Looks like Mark reviewed this last week -- are you planning to respin?
> > 
> > Yes, of course. I just had no time yet and I'm a bit lost on how to
> > proceed without using the NUMA node information which Mark did not like
> > to be used.
> > 
> > The only way to know which device is on which node would be to look
> > at the PCI topology (which is also the source of the NUMA node_id).
> > We could do this manually in order to not depend on CONFIG_NUMA,
> > but I would like to know if that is acceptable before respinning the
> > patches.
> 
> That doesn't feel like it really addresses Mark's concerns -- it's just
> another way to get the information that isn't a first-class PMU topology
> description from firmware.
> 
> Now, I don't actually mind using the NUMA topology so much in the cases
> where it genuinely correlates with the PMU topology. My objection is more
> that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
> result in working with an incorrect PMU topology and passing all of that
> through to userspace.
> 
> So I'd prefer either making the driver depend on NUMA, or at the very least
> failing to probe  the PMU if we discover a socketed system and NUMA is not
> selected. Do either of those work as a compromise?
> 
> Will

That sounds like a good compromise.

So I could do the following:

1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
   information to determine the device node

2) If CONFIG_NUMA is not set we check if we run on a socketed system

   a) In that case we return an error and give a message that CONFIG_NUMA needs
      to be enabled
   b) Otherwise we have a single node system and use node_id = 0

David noted that it would also be possible to extract the node id from
the physical address of the device, but I'm not sure that classifies as
'first-class' topology description...

--Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-26 12:08           ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-26 12:08 UTC (permalink / raw)
  To: linux-arm-kernel

On Mon, Apr 25, 2016 at 02:19:07PM +0100, Will Deacon wrote:
> On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> > On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > > can you have a look at these patches?
> > > 
> > > Looks like Mark reviewed this last week -- are you planning to respin?
> > 
> > Yes, of course. I just had no time yet and I'm a bit lost on how to
> > proceed without using the NUMA node information which Mark did not like
> > to be used.
> > 
> > The only way to know which device is on which node would be to look
> > at the PCI topology (which is also the source of the NUMA node_id).
> > We could do this manually in order to not depend on CONFIG_NUMA,
> > but I would like to know if that is acceptable before respinning the
> > patches.
> 
> That doesn't feel like it really addresses Mark's concerns -- it's just
> another way to get the information that isn't a first-class PMU topology
> description from firmware.
> 
> Now, I don't actually mind using the NUMA topology so much in the cases
> where it genuinely correlates with the PMU topology. My objection is more
> that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
> result in working with an incorrect PMU topology and passing all of that
> through to userspace.
> 
> So I'd prefer either making the driver depend on NUMA, or at the very least
> failing to probe  the PMU if we discover a socketed system and NUMA is not
> selected. Do either of those work as a compromise?
> 
> Will

That sounds like a good compromise.

So I could do the following:

1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
   information to determine the device node

2) If CONFIG_NUMA is not set we check if we run on a socketed system

   a) In that case we return an error and give a message that CONFIG_NUMA needs
      to be enabled
   b) Otherwise we have a single node system and use node_id = 0

David noted that it would also be possible to extract the node id from
the physical address of the device, but I'm not sure that classifies as
'first-class' topology description...

--Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-04-26 12:08           ` Jan Glauber
@ 2016-04-26 13:53             ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-04-26 13:53 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel, David Daney

On Tue, Apr 26, 2016 at 02:08:09PM +0200, Jan Glauber wrote:
> On Mon, Apr 25, 2016 at 02:19:07PM +0100, Will Deacon wrote:
> > On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> > > On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > > > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > > > can you have a look at these patches?
> > > > 
> > > > Looks like Mark reviewed this last week -- are you planning to respin?
> > > 
> > > Yes, of course. I just had no time yet and I'm a bit lost on how to
> > > proceed without using the NUMA node information which Mark did not like
> > > to be used.
> > > 
> > > The only way to know which device is on which node would be to look
> > > at the PCI topology (which is also the source of the NUMA node_id).
> > > We could do this manually in order to not depend on CONFIG_NUMA,
> > > but I would like to know if that is acceptable before respinning the
> > > patches.
> > 
> > That doesn't feel like it really addresses Mark's concerns -- it's just
> > another way to get the information that isn't a first-class PMU topology
> > description from firmware.
> > 
> > Now, I don't actually mind using the NUMA topology so much in the cases
> > where it genuinely correlates with the PMU topology. My objection is more
> > that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
> > result in working with an incorrect PMU topology and passing all of that
> > through to userspace.
> > 
> > So I'd prefer either making the driver depend on NUMA, or at the very least
> > failing to probe  the PMU if we discover a socketed system and NUMA is not
> > selected. Do either of those work as a compromise?
> > 
> > Will
> 
> That sounds like a good compromise.
> 
> So I could do the following:
> 
> 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
>    information to determine the device node
> 
> 2) If CONFIG_NUMA is not set we check if we run on a socketed system
> 
>    a) In that case we return an error and give a message that CONFIG_NUMA needs
>       to be enabled
>    b) Otherwise we have a single node system and use node_id = 0

That sounds sensible to me. How do you "check if we run on a socketed
system"? My assumption would be that you could figure this out from the
firmware tables?

> David noted that it would also be possible to extract the node id from
> the physical address of the device, but I'm not sure that classifies as
> 'first-class' topology description...

I'd rather avoid this sort of probing, as it inevitably breaks when it
sees new hardware that doesn't follow the unwritten assumptions of the
old hardware.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-26 13:53             ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-04-26 13:53 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 26, 2016 at 02:08:09PM +0200, Jan Glauber wrote:
> On Mon, Apr 25, 2016 at 02:19:07PM +0100, Will Deacon wrote:
> > On Mon, Apr 25, 2016 at 02:02:22PM +0200, Jan Glauber wrote:
> > > On Mon, Apr 25, 2016 at 12:22:07PM +0100, Will Deacon wrote:
> > > > On Mon, Apr 04, 2016 at 02:19:54PM +0200, Jan Glauber wrote:
> > > > > can you have a look at these patches?
> > > > 
> > > > Looks like Mark reviewed this last week -- are you planning to respin?
> > > 
> > > Yes, of course. I just had no time yet and I'm a bit lost on how to
> > > proceed without using the NUMA node information which Mark did not like
> > > to be used.
> > > 
> > > The only way to know which device is on which node would be to look
> > > at the PCI topology (which is also the source of the NUMA node_id).
> > > We could do this manually in order to not depend on CONFIG_NUMA,
> > > but I would like to know if that is acceptable before respinning the
> > > patches.
> > 
> > That doesn't feel like it really addresses Mark's concerns -- it's just
> > another way to get the information that isn't a first-class PMU topology
> > description from firmware.
> > 
> > Now, I don't actually mind using the NUMA topology so much in the cases
> > where it genuinely correlates with the PMU topology. My objection is more
> > that we end up sticking everything on node 0 if !CONFIG_NUMA, which could
> > result in working with an incorrect PMU topology and passing all of that
> > through to userspace.
> > 
> > So I'd prefer either making the driver depend on NUMA, or at the very least
> > failing to probe  the PMU if we discover a socketed system and NUMA is not
> > selected. Do either of those work as a compromise?
> > 
> > Will
> 
> That sounds like a good compromise.
> 
> So I could do the following:
> 
> 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
>    information to determine the device node
> 
> 2) If CONFIG_NUMA is not set we check if we run on a socketed system
> 
>    a) In that case we return an error and give a message that CONFIG_NUMA needs
>       to be enabled
>    b) Otherwise we have a single node system and use node_id = 0

That sounds sensible to me. How do you "check if we run on a socketed
system"? My assumption would be that you could figure this out from the
firmware tables?

> David noted that it would also be possible to extract the node id from
> the physical address of the device, but I'm not sure that classifies as
> 'first-class' topology description...

I'd rather avoid this sort of probing, as it inevitably breaks when it
sees new hardware that doesn't follow the unwritten assumptions of the
old hardware.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-04-26 13:53             ` Will Deacon
@ 2016-04-27 10:51               ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-27 10:51 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel, David Daney

On Tue, Apr 26, 2016 at 02:53:54PM +0100, Will Deacon wrote:

[...]

> > 
> > That sounds like a good compromise.
> > 
> > So I could do the following:
> > 
> > 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
> >    information to determine the device node
> > 
> > 2) If CONFIG_NUMA is not set we check if we run on a socketed system
> > 
> >    a) In that case we return an error and give a message that CONFIG_NUMA needs
> >       to be enabled
> >    b) Otherwise we have a single node system and use node_id = 0
> 
> That sounds sensible to me. How do you "check if we run on a socketed
> system"? My assumption would be that you could figure this out from the
> firmware tables?

There are probably multiple ways to detect a socketed system, with some quite
hardware specific. I would like to avoid parsing DT (and ACPI) though,
if possible.

A generic approach would be to do a query of the multiprocessor affinity
register (MPIDR_EL1) on all CPUs. The AFF2 part (bits 23:16) contains the 
socket number on ThunderX. If this is non-zero on any CPU I would assume a
socketed system.

Would that be feasible?

thanks,
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-27 10:51               ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-04-27 10:51 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Apr 26, 2016 at 02:53:54PM +0100, Will Deacon wrote:

[...]

> > 
> > That sounds like a good compromise.
> > 
> > So I could do the following:
> > 
> > 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
> >    information to determine the device node
> > 
> > 2) If CONFIG_NUMA is not set we check if we run on a socketed system
> > 
> >    a) In that case we return an error and give a message that CONFIG_NUMA needs
> >       to be enabled
> >    b) Otherwise we have a single node system and use node_id = 0
> 
> That sounds sensible to me. How do you "check if we run on a socketed
> system"? My assumption would be that you could figure this out from the
> firmware tables?

There are probably multiple ways to detect a socketed system, with some quite
hardware specific. I would like to avoid parsing DT (and ACPI) though,
if possible.

A generic approach would be to do a query of the multiprocessor affinity
register (MPIDR_EL1) on all CPUs. The AFF2 part (bits 23:16) contains the 
socket number on ThunderX. If this is non-zero on any CPU I would assume a
socketed system.

Would that be feasible?

thanks,
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-04-27 10:51               ` Jan Glauber
@ 2016-04-27 11:18                 ` Mark Rutland
  -1 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-27 11:18 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Will Deacon, linux-kernel, linux-arm-kernel, David Daney

On Wed, Apr 27, 2016 at 12:51:56PM +0200, Jan Glauber wrote:
> On Tue, Apr 26, 2016 at 02:53:54PM +0100, Will Deacon wrote:
> 
> [...]
> 
> > > 
> > > That sounds like a good compromise.
> > > 
> > > So I could do the following:
> > > 
> > > 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
> > >    information to determine the device node
> > > 
> > > 2) If CONFIG_NUMA is not set we check if we run on a socketed system
> > > 
> > >    a) In that case we return an error and give a message that CONFIG_NUMA needs
> > >       to be enabled
> > >    b) Otherwise we have a single node system and use node_id = 0
> > 
> > That sounds sensible to me. How do you "check if we run on a socketed
> > system"? My assumption would be that you could figure this out from the
> > firmware tables?
> 
> There are probably multiple ways to detect a socketed system, with some quite
> hardware specific. I would like to avoid parsing DT (and ACPI) though,
> if possible.
> 
> A generic approach would be to do a query of the multiprocessor affinity
> register (MPIDR_EL1) on all CPUs. The AFF2 part (bits 23:16) contains the 
> socket number on ThunderX. If this is non-zero on any CPU I would assume a
> socketed system.
> 
> Would that be feasible?

As with checking the physical address of a peripheral, this is an
unwritten assumption, and I suspect that similarly, it will inevitably
break (e.g. if Aff3 becomes used).

If you expect kernels relevant to your platform to have NUMA support,
you can simply depend on NUMA to determine whether or not you have NUMA
nodes.

Regarding relying on NUMA nodes, I have two concerns:

In general a NUMA node is not necessarily a socket, as you can have NUMA
properties even within a socket. If you can guarantee that for your
platform NUMA nodes will always be sockets, then I guess using NUMA
nodes is ok, though I imagine that as with the physical address map and
organisation of CPU IDs, that's difficult to have set in stone.

Linux NUMA node IDs are arbitrary tokens, and may not necessarily idmap
to documented socket IDs for your platform (even if they happen to
today). If you're happy to have users figure out how those IDs map to
clusters, that's fine, but otherwise you need to expose additional
information such that users get what they expect (at which point, if you
have said information we probably don't need NUMA information).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-04-27 11:18                 ` Mark Rutland
  0 siblings, 0 replies; 50+ messages in thread
From: Mark Rutland @ 2016-04-27 11:18 UTC (permalink / raw)
  To: linux-arm-kernel

On Wed, Apr 27, 2016 at 12:51:56PM +0200, Jan Glauber wrote:
> On Tue, Apr 26, 2016 at 02:53:54PM +0100, Will Deacon wrote:
> 
> [...]
> 
> > > 
> > > That sounds like a good compromise.
> > > 
> > > So I could do the following:
> > > 
> > > 1) In the uncore setup check for CONFIG_NUMA, if set use the NUMA
> > >    information to determine the device node
> > > 
> > > 2) If CONFIG_NUMA is not set we check if we run on a socketed system
> > > 
> > >    a) In that case we return an error and give a message that CONFIG_NUMA needs
> > >       to be enabled
> > >    b) Otherwise we have a single node system and use node_id = 0
> > 
> > That sounds sensible to me. How do you "check if we run on a socketed
> > system"? My assumption would be that you could figure this out from the
> > firmware tables?
> 
> There are probably multiple ways to detect a socketed system, with some quite
> hardware specific. I would like to avoid parsing DT (and ACPI) though,
> if possible.
> 
> A generic approach would be to do a query of the multiprocessor affinity
> register (MPIDR_EL1) on all CPUs. The AFF2 part (bits 23:16) contains the 
> socket number on ThunderX. If this is non-zero on any CPU I would assume a
> socketed system.
> 
> Would that be feasible?

As with checking the physical address of a peripheral, this is an
unwritten assumption, and I suspect that similarly, it will inevitably
break (e.g. if Aff3 becomes used).

If you expect kernels relevant to your platform to have NUMA support,
you can simply depend on NUMA to determine whether or not you have NUMA
nodes.

Regarding relying on NUMA nodes, I have two concerns:

In general a NUMA node is not necessarily a socket, as you can have NUMA
properties even within a socket. If you can guarantee that for your
platform NUMA nodes will always be sockets, then I guess using NUMA
nodes is ok, though I imagine that as with the physical address map and
organisation of CPU IDs, that's difficult to have set in stone.

Linux NUMA node IDs are arbitrary tokens, and may not necessarily idmap
to documented socket IDs for your platform (even if they happen to
today). If you're happy to have users figure out how those IDs map to
clusters, that's fine, but otherwise you need to expose additional
information such that users get what they expect (at which point, if you
have said information we probably don't need NUMA information).

Thanks,
Mark.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-03-09 16:21 ` Jan Glauber
@ 2016-06-28 10:24   ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-06-28 10:24 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

Hi Jan,

On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> This patch series provides access to various counters on the ThunderX SOC.
> 
> For details of the uncore implementation see patch #1.
> 
> Patches #2-5 add the various ThunderX specific PMUs.
> 
> As suggested I've put the files under drivers/perf/uncore. I would
> prefer this location over drivers/bus because not all of the uncore
> drivers are bus related.

What's the status of these patches? Were you planning to send a new
version?

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-06-28 10:24   ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-06-28 10:24 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jan,

On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> This patch series provides access to various counters on the ThunderX SOC.
> 
> For details of the uncore implementation see patch #1.
> 
> Patches #2-5 add the various ThunderX specific PMUs.
> 
> As suggested I've put the files under drivers/perf/uncore. I would
> prefer this location over drivers/bus because not all of the uncore
> drivers are bus related.

What's the status of these patches? Were you planning to send a new
version?

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-06-28 10:24   ` Will Deacon
@ 2016-06-28 14:04     ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-06-28 14:04 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Tue, Jun 28, 2016 at 11:24:20AM +0100, Will Deacon wrote:
> Hi Jan,
> 
> On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> > This patch series provides access to various counters on the ThunderX SOC.
> > 
> > For details of the uncore implementation see patch #1.
> > 
> > Patches #2-5 add the various ThunderX specific PMUs.
> > 
> > As suggested I've put the files under drivers/perf/uncore. I would
> > prefer this location over drivers/bus because not all of the uncore
> > drivers are bus related.
> 
> What's the status of these patches? Were you planning to send a new
> version?
> 
> Will

Hi Will,

I was half-way through with addressing Mark's review comments when
got side-tracked.

The principle question these patches raised remains open though in my
opinion, how to determine the socket a device belongs to.

There is no first-class interface to ask a device or the firmware
which socket the device lives on.

The options I see are:
A) Using NUMA node information, depends on CONFIG_NUMA
B) Decoding the socket bits of the PCI BAR address
C) Using PCI topology information

A is what I tried, but I agree that depending on CONFIG_NUMA is not a good
solution. B would be easy but looks not very future-proof. So option C
is what is left...

thanks,
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-06-28 14:04     ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-06-28 14:04 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 28, 2016 at 11:24:20AM +0100, Will Deacon wrote:
> Hi Jan,
> 
> On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> > This patch series provides access to various counters on the ThunderX SOC.
> > 
> > For details of the uncore implementation see patch #1.
> > 
> > Patches #2-5 add the various ThunderX specific PMUs.
> > 
> > As suggested I've put the files under drivers/perf/uncore. I would
> > prefer this location over drivers/bus because not all of the uncore
> > drivers are bus related.
> 
> What's the status of these patches? Were you planning to send a new
> version?
> 
> Will

Hi Will,

I was half-way through with addressing Mark's review comments when
got side-tracked.

The principle question these patches raised remains open though in my
opinion, how to determine the socket a device belongs to.

There is no first-class interface to ask a device or the firmware
which socket the device lives on.

The options I see are:
A) Using NUMA node information, depends on CONFIG_NUMA
B) Decoding the socket bits of the PCI BAR address
C) Using PCI topology information

A is what I tried, but I agree that depending on CONFIG_NUMA is not a good
solution. B would be easy but looks not very future-proof. So option C
is what is left...

thanks,
Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-06-28 14:04     ` Jan Glauber
@ 2016-07-04 10:11       ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-07-04 10:11 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Tue, Jun 28, 2016 at 04:04:59PM +0200, Jan Glauber wrote:
> On Tue, Jun 28, 2016 at 11:24:20AM +0100, Will Deacon wrote:
> > On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> > > This patch series provides access to various counters on the ThunderX SOC.
> > > 
> > > For details of the uncore implementation see patch #1.
> > > 
> > > Patches #2-5 add the various ThunderX specific PMUs.
> > > 
> > > As suggested I've put the files under drivers/perf/uncore. I would
> > > prefer this location over drivers/bus because not all of the uncore
> > > drivers are bus related.
> > 
> > What's the status of these patches? Were you planning to send a new
> > version?
>
> I was half-way through with addressing Mark's review comments when
> got side-tracked.
> 
> The principle question these patches raised remains open though in my
> opinion, how to determine the socket a device belongs to.
> 
> There is no first-class interface to ask a device or the firmware
> which socket the device lives on.
> 
> The options I see are:
> A) Using NUMA node information, depends on CONFIG_NUMA
> B) Decoding the socket bits of the PCI BAR address
> C) Using PCI topology information
> 
> A is what I tried, but I agree that depending on CONFIG_NUMA is not a good
> solution. B would be easy but looks not very future-proof. So option C
> is what is left...

Sorry to go full circle on this, but "depends on NUMA" sounds better
than deriving NUMA topology from PCI to me. The only worry I have is if
the NUMA information ends up being insufficient in the long-term, and we
end up with a mixture of the three options above in order to figure out
the PMU topology.

As long as you're happy that the PMU:NUMA topology remains 1:1, then I
have no objections. The moment you need extra hacks on the side, we should
probably drop the NUMA dependency altogether and figure it out some other
way.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-07-04 10:11       ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-07-04 10:11 UTC (permalink / raw)
  To: linux-arm-kernel

On Tue, Jun 28, 2016 at 04:04:59PM +0200, Jan Glauber wrote:
> On Tue, Jun 28, 2016 at 11:24:20AM +0100, Will Deacon wrote:
> > On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> > > This patch series provides access to various counters on the ThunderX SOC.
> > > 
> > > For details of the uncore implementation see patch #1.
> > > 
> > > Patches #2-5 add the various ThunderX specific PMUs.
> > > 
> > > As suggested I've put the files under drivers/perf/uncore. I would
> > > prefer this location over drivers/bus because not all of the uncore
> > > drivers are bus related.
> > 
> > What's the status of these patches? Were you planning to send a new
> > version?
>
> I was half-way through with addressing Mark's review comments when
> got side-tracked.
> 
> The principle question these patches raised remains open though in my
> opinion, how to determine the socket a device belongs to.
> 
> There is no first-class interface to ask a device or the firmware
> which socket the device lives on.
> 
> The options I see are:
> A) Using NUMA node information, depends on CONFIG_NUMA
> B) Decoding the socket bits of the PCI BAR address
> C) Using PCI topology information
> 
> A is what I tried, but I agree that depending on CONFIG_NUMA is not a good
> solution. B would be easy but looks not very future-proof. So option C
> is what is left...

Sorry to go full circle on this, but "depends on NUMA" sounds better
than deriving NUMA topology from PCI to me. The only worry I have is if
the NUMA information ends up being insufficient in the long-term, and we
end up with a mixture of the three options above in order to figure out
the PMU topology.

As long as you're happy that the PMU:NUMA topology remains 1:1, then I
have no objections. The moment you need extra hacks on the side, we should
probably drop the NUMA dependency altogether and figure it out some other
way.

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-07-04 10:11       ` Will Deacon
@ 2016-09-16  7:55         ` Will Deacon
  -1 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-09-16  7:55 UTC (permalink / raw)
  To: Jan Glauber; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

Hi Jan,

On Mon, Jul 04, 2016 at 11:11:32AM +0100, Will Deacon wrote:
> On Tue, Jun 28, 2016 at 04:04:59PM +0200, Jan Glauber wrote:
> > On Tue, Jun 28, 2016 at 11:24:20AM +0100, Will Deacon wrote:
> > > On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> > > > This patch series provides access to various counters on the ThunderX SOC.
> > > > 
> > > > For details of the uncore implementation see patch #1.
> > > > 
> > > > Patches #2-5 add the various ThunderX specific PMUs.
> > > > 
> > > > As suggested I've put the files under drivers/perf/uncore. I would
> > > > prefer this location over drivers/bus because not all of the uncore
> > > > drivers are bus related.
> > > 
> > > What's the status of these patches? Were you planning to send a new
> > > version?
> >
> > I was half-way through with addressing Mark's review comments when
> > got side-tracked.
> > 
> > The principle question these patches raised remains open though in my
> > opinion, how to determine the socket a device belongs to.
> > 
> > There is no first-class interface to ask a device or the firmware
> > which socket the device lives on.
> > 
> > The options I see are:
> > A) Using NUMA node information, depends on CONFIG_NUMA
> > B) Decoding the socket bits of the PCI BAR address
> > C) Using PCI topology information
> > 
> > A is what I tried, but I agree that depending on CONFIG_NUMA is not a good
> > solution. B would be easy but looks not very future-proof. So option C
> > is what is left...
> 
> Sorry to go full circle on this, but "depends on NUMA" sounds better
> than deriving NUMA topology from PCI to me. The only worry I have is if
> the NUMA information ends up being insufficient in the long-term, and we
> end up with a mixture of the three options above in order to figure out
> the PMU topology.
> 
> As long as you're happy that the PMU:NUMA topology remains 1:1, then I
> have no objections. The moment you need extra hacks on the side, we should
> probably drop the NUMA dependency altogether and figure it out some other
> way.

Any news on this series, or did I miss a v3? I was hoping to have this in
for 4.9, but it seems to have stalled :(

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-09-16  7:55         ` Will Deacon
  0 siblings, 0 replies; 50+ messages in thread
From: Will Deacon @ 2016-09-16  7:55 UTC (permalink / raw)
  To: linux-arm-kernel

Hi Jan,

On Mon, Jul 04, 2016 at 11:11:32AM +0100, Will Deacon wrote:
> On Tue, Jun 28, 2016 at 04:04:59PM +0200, Jan Glauber wrote:
> > On Tue, Jun 28, 2016 at 11:24:20AM +0100, Will Deacon wrote:
> > > On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> > > > This patch series provides access to various counters on the ThunderX SOC.
> > > > 
> > > > For details of the uncore implementation see patch #1.
> > > > 
> > > > Patches #2-5 add the various ThunderX specific PMUs.
> > > > 
> > > > As suggested I've put the files under drivers/perf/uncore. I would
> > > > prefer this location over drivers/bus because not all of the uncore
> > > > drivers are bus related.
> > > 
> > > What's the status of these patches? Were you planning to send a new
> > > version?
> >
> > I was half-way through with addressing Mark's review comments when
> > got side-tracked.
> > 
> > The principle question these patches raised remains open though in my
> > opinion, how to determine the socket a device belongs to.
> > 
> > There is no first-class interface to ask a device or the firmware
> > which socket the device lives on.
> > 
> > The options I see are:
> > A) Using NUMA node information, depends on CONFIG_NUMA
> > B) Decoding the socket bits of the PCI BAR address
> > C) Using PCI topology information
> > 
> > A is what I tried, but I agree that depending on CONFIG_NUMA is not a good
> > solution. B would be easy but looks not very future-proof. So option C
> > is what is left...
> 
> Sorry to go full circle on this, but "depends on NUMA" sounds better
> than deriving NUMA topology from PCI to me. The only worry I have is if
> the NUMA information ends up being insufficient in the long-term, and we
> end up with a mixture of the three options above in order to figure out
> the PMU topology.
> 
> As long as you're happy that the PMU:NUMA topology remains 1:1, then I
> have no objections. The moment you need extra hacks on the side, we should
> probably drop the NUMA dependency altogether and figure it out some other
> way.

Any news on this series, or did I miss a v3? I was hoping to have this in
for 4.9, but it seems to have stalled :(

Will

^ permalink raw reply	[flat|nested] 50+ messages in thread

* Re: [PATCH v2 0/5] Cavium ThunderX uncore PMU support
  2016-09-16  7:55         ` Will Deacon
@ 2016-09-16  8:39           ` Jan Glauber
  -1 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-09-16  8:39 UTC (permalink / raw)
  To: Will Deacon; +Cc: Mark Rutland, linux-kernel, linux-arm-kernel

On Fri, Sep 16, 2016 at 08:55:24AM +0100, Will Deacon wrote:
> Hi Jan,
> 
> On Mon, Jul 04, 2016 at 11:11:32AM +0100, Will Deacon wrote:
> > On Tue, Jun 28, 2016 at 04:04:59PM +0200, Jan Glauber wrote:
> > > On Tue, Jun 28, 2016 at 11:24:20AM +0100, Will Deacon wrote:
> > > > On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> > > > > This patch series provides access to various counters on the ThunderX SOC.
> > > > > 
> > > > > For details of the uncore implementation see patch #1.
> > > > > 
> > > > > Patches #2-5 add the various ThunderX specific PMUs.
> > > > > 
> > > > > As suggested I've put the files under drivers/perf/uncore. I would
> > > > > prefer this location over drivers/bus because not all of the uncore
> > > > > drivers are bus related.
> > > > 
> > > > What's the status of these patches? Were you planning to send a new
> > > > version?
> > >
> > > I was half-way through with addressing Mark's review comments when
> > > got side-tracked.
> > > 
> > > The principle question these patches raised remains open though in my
> > > opinion, how to determine the socket a device belongs to.
> > > 
> > > There is no first-class interface to ask a device or the firmware
> > > which socket the device lives on.
> > > 
> > > The options I see are:
> > > A) Using NUMA node information, depends on CONFIG_NUMA
> > > B) Decoding the socket bits of the PCI BAR address
> > > C) Using PCI topology information
> > > 
> > > A is what I tried, but I agree that depending on CONFIG_NUMA is not a good
> > > solution. B would be easy but looks not very future-proof. So option C
> > > is what is left...
> > 
> > Sorry to go full circle on this, but "depends on NUMA" sounds better
> > than deriving NUMA topology from PCI to me. The only worry I have is if
> > the NUMA information ends up being insufficient in the long-term, and we
> > end up with a mixture of the three options above in order to figure out
> > the PMU topology.
> > 
> > As long as you're happy that the PMU:NUMA topology remains 1:1, then I
> > have no objections. The moment you need extra hacks on the side, we should
> > probably drop the NUMA dependency altogether and figure it out some other
> > way.
> 
> Any news on this series, or did I miss a v3? I was hoping to have this in
> for 4.9, but it seems to have stalled :(
> 
> Will

No news, I'm afraid it is stalled on my side :( I'll try to get back to
it, but not for 4.9.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* [PATCH v2 0/5] Cavium ThunderX uncore PMU support
@ 2016-09-16  8:39           ` Jan Glauber
  0 siblings, 0 replies; 50+ messages in thread
From: Jan Glauber @ 2016-09-16  8:39 UTC (permalink / raw)
  To: linux-arm-kernel

On Fri, Sep 16, 2016 at 08:55:24AM +0100, Will Deacon wrote:
> Hi Jan,
> 
> On Mon, Jul 04, 2016 at 11:11:32AM +0100, Will Deacon wrote:
> > On Tue, Jun 28, 2016 at 04:04:59PM +0200, Jan Glauber wrote:
> > > On Tue, Jun 28, 2016 at 11:24:20AM +0100, Will Deacon wrote:
> > > > On Wed, Mar 09, 2016 at 05:21:02PM +0100, Jan Glauber wrote:
> > > > > This patch series provides access to various counters on the ThunderX SOC.
> > > > > 
> > > > > For details of the uncore implementation see patch #1.
> > > > > 
> > > > > Patches #2-5 add the various ThunderX specific PMUs.
> > > > > 
> > > > > As suggested I've put the files under drivers/perf/uncore. I would
> > > > > prefer this location over drivers/bus because not all of the uncore
> > > > > drivers are bus related.
> > > > 
> > > > What's the status of these patches? Were you planning to send a new
> > > > version?
> > >
> > > I was half-way through with addressing Mark's review comments when
> > > got side-tracked.
> > > 
> > > The principle question these patches raised remains open though in my
> > > opinion, how to determine the socket a device belongs to.
> > > 
> > > There is no first-class interface to ask a device or the firmware
> > > which socket the device lives on.
> > > 
> > > The options I see are:
> > > A) Using NUMA node information, depends on CONFIG_NUMA
> > > B) Decoding the socket bits of the PCI BAR address
> > > C) Using PCI topology information
> > > 
> > > A is what I tried, but I agree that depending on CONFIG_NUMA is not a good
> > > solution. B would be easy but looks not very future-proof. So option C
> > > is what is left...
> > 
> > Sorry to go full circle on this, but "depends on NUMA" sounds better
> > than deriving NUMA topology from PCI to me. The only worry I have is if
> > the NUMA information ends up being insufficient in the long-term, and we
> > end up with a mixture of the three options above in order to figure out
> > the PMU topology.
> > 
> > As long as you're happy that the PMU:NUMA topology remains 1:1, then I
> > have no objections. The moment you need extra hacks on the side, we should
> > probably drop the NUMA dependency altogether and figure it out some other
> > way.
> 
> Any news on this series, or did I miss a v3? I was hoping to have this in
> for 4.9, but it seems to have stalled :(
> 
> Will

No news, I'm afraid it is stalled on my side :( I'll try to get back to
it, but not for 4.9.

Jan

^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2016-09-16  8:39 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2016-03-09 16:21 [PATCH v2 0/5] Cavium ThunderX uncore PMU support Jan Glauber
2016-03-09 16:21 ` Jan Glauber
2016-03-09 16:21 ` [PATCH v2 1/5] arm64/perf: Basic uncore counter support for Cavium ThunderX Jan Glauber
2016-03-09 16:21   ` Jan Glauber
2016-04-19 15:06   ` Mark Rutland
2016-04-19 15:06     ` Mark Rutland
2016-04-20 12:29     ` Jan Glauber
2016-04-20 12:29       ` Jan Glauber
2016-03-09 16:21 ` [PATCH v2 2/5] arm64/perf: Cavium ThunderX L2C TAD uncore support Jan Glauber
2016-03-09 16:21   ` Jan Glauber
2016-04-19 15:43   ` Mark Rutland
2016-04-19 15:43     ` Mark Rutland
2016-03-09 16:21 ` [PATCH v2 3/5] arm64/perf: Cavium ThunderX L2C CBC " Jan Glauber
2016-03-09 16:21   ` Jan Glauber
2016-04-19 15:56   ` Mark Rutland
2016-04-19 15:56     ` Mark Rutland
2016-03-09 16:21 ` [PATCH v2 4/5] arm64/perf: Cavium ThunderX LMC " Jan Glauber
2016-03-09 16:21   ` Jan Glauber
2016-03-09 16:21 ` [PATCH v2 5/5] arm64/perf: Cavium ThunderX OCX TLK " Jan Glauber
2016-03-09 16:21   ` Jan Glauber
2016-04-04 12:19 ` [PATCH v2 0/5] Cavium ThunderX uncore PMU support Jan Glauber
2016-04-04 12:19   ` Jan Glauber
2016-04-25 11:22   ` Will Deacon
2016-04-25 11:22     ` Will Deacon
2016-04-25 12:02     ` Jan Glauber
2016-04-25 12:02       ` Jan Glauber
2016-04-25 13:19       ` Will Deacon
2016-04-25 13:19         ` Will Deacon
2016-04-26 12:08         ` Jan Glauber
2016-04-26 12:08           ` Jan Glauber
2016-04-26 13:53           ` Will Deacon
2016-04-26 13:53             ` Will Deacon
2016-04-27 10:51             ` Jan Glauber
2016-04-27 10:51               ` Jan Glauber
2016-04-27 11:18               ` Mark Rutland
2016-04-27 11:18                 ` Mark Rutland
     [not found] ` <CAEiAFz3eCsX3VoNus_Rq+En5zuB8fAxNCbC3ktw2NqLKwC=_kA@mail.gmail.com>
2016-04-19 10:35   ` Jan Glauber
2016-04-19 10:35     ` Jan Glauber
2016-04-19 16:03     ` Mark Rutland
2016-04-19 16:03       ` Mark Rutland
2016-06-28 10:24 ` Will Deacon
2016-06-28 10:24   ` Will Deacon
2016-06-28 14:04   ` Jan Glauber
2016-06-28 14:04     ` Jan Glauber
2016-07-04 10:11     ` Will Deacon
2016-07-04 10:11       ` Will Deacon
2016-09-16  7:55       ` Will Deacon
2016-09-16  7:55         ` Will Deacon
2016-09-16  8:39         ` Jan Glauber
2016-09-16  8:39           ` Jan Glauber

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.