All of lore.kernel.org
 help / color / mirror / Atom feed
From: Jan Glauber <jan.glauber@caviumnetworks.com>
To: Will Deacon <will.deacon@arm.com>
Cc: Mark Rutland <mark.rutland@arm.com>,
	<linux-kernel@vger.kernel.org>,
	<linux-arm-kernel@lists.infradead.org>
Subject: Re: [PATCH v4 1/5] arm64: perf: Basic uncore counter support for Cavium ThunderX SOC
Date: Fri, 11 Nov 2016 11:30:29 +0100	[thread overview]
Message-ID: <20161111103029.GD16907@hardcore> (raw)
In-Reply-To: <20161108235010.GC17771@arm.com>

Hi Will,

thanks for the review!

On Tue, Nov 08, 2016 at 11:50:10PM +0000, Will Deacon wrote:
> Hi Jan,
> 
> Thanks for posting an updated series. I have a few minor comments, which
> we can hopefully address in time for 4.10.
> 
> Also, have you run the perf fuzzer with this series applied?

No, that's new to me. Will try it.

> https://github.com/deater/perf_event_tests
> 
> (build the tests and look under the fuzzer/ directory for the tool)
> 
> On Sat, Oct 29, 2016 at 01:55:29PM +0200, Jan Glauber wrote:
> > Provide "uncore" facilities for different non-CPU performance
> > counter units.
> > 
> > The uncore PMUs can be found under /sys/bus/event_source/devices.
> > All counters are exported via sysfs in the corresponding events
> > files under the PMU directory so the perf tool can list the event names.
> > 
> > There are some points that are special in this implementation:
> > 
> > 1) The PMU detection relies on PCI device detection. If a
> >    matching PCI device is found the PMU is created. The code can deal
> >    with multiple units of the same type, e.g. more than one memory
> >    controller.
> > 
> > 2) Counters are summarized across different units of the same type
> >    on one NUMA node but not across NUMA nodes.
> >    For instance L2C TAD 0..7 are presented as a single counter
> >    (adding the values from TAD 0 to 7). Although losing the ability
> >    to read a single value the merged values are easier to use.
> > 
> > 3) The counters are not CPU related. A random CPU is picked regardless
> >    of the NUMA node. There is a small performance penalty for accessing
> >    counters on a remote note but reading a performance counter is a
> >    slow operation anyway.
> > 
> > Signed-off-by: Jan Glauber <jglauber@cavium.com>
> > ---
> >  drivers/perf/Kconfig                |  13 ++
> >  drivers/perf/Makefile               |   1 +
> >  drivers/perf/uncore/Makefile        |   1 +
> >  drivers/perf/uncore/uncore_cavium.c | 351 ++++++++++++++++++++++++++++++++++++
> >  drivers/perf/uncore/uncore_cavium.h |  71 ++++++++
> 
> We already have "uncore" PMUs under drivers/perf, so I'd prefer that we
> renamed this a bit to reflect better what's going on. How about:
> 
>   drivers/perf/cavium/
> 
> and then
> 
>   drivers/perf/cavium/uncore_thunder.[ch]
> 
> ?

OK, agreed.

> >  include/linux/cpuhotplug.h          |   1 +
> >  6 files changed, 438 insertions(+)
> >  create mode 100644 drivers/perf/uncore/Makefile
> >  create mode 100644 drivers/perf/uncore/uncore_cavium.c
> >  create mode 100644 drivers/perf/uncore/uncore_cavium.h
> > 
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 4d5c5f9..3266c87 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -19,4 +19,17 @@ config XGENE_PMU
> >          help
> >            Say y if you want to use APM X-Gene SoC performance monitors.
> >  
> > +config UNCORE_PMU
> > +	bool
> 
> This isn't needed.

I thought about a symbol for all uncore pmus. But when drivers/perf is
already that it can be removed.

> > +
> > +config UNCORE_PMU_CAVIUM
> > +	depends on PERF_EVENTS && NUMA && ARM64
> > +	bool "Cavium uncore PMU support"
> 
> Please mentioned Thunder somewhere, since that's the SoC being supported.

OK.

> > +	select UNCORE_PMU
> > +	default y
> > +	help
> > +	  Say y if you want to access performance counters of subsystems
> > +	  on a Cavium SOC like cache controller, memory controller or
> > +	  processor interconnect.
> > +
> >  endmenu
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index b116e98..d6c02c9 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -1,2 +1,3 @@
> >  obj-$(CONFIG_ARM_PMU) += arm_pmu.o
> >  obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
> > +obj-y += uncore/
> > diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
> > new file mode 100644
> > index 0000000..6130e18
> > --- /dev/null
> > +++ b/drivers/perf/uncore/Makefile
> > @@ -0,0 +1 @@
> > +obj-$(CONFIG_UNCORE_PMU_CAVIUM) += uncore_cavium.o
> > diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
> > new file mode 100644
> > index 0000000..a7b4277
> > --- /dev/null
> > +++ b/drivers/perf/uncore/uncore_cavium.c
> > @@ -0,0 +1,351 @@
> > +/*
> > + * Cavium Thunder uncore PMU support.
> > + *
> > + * Copyright (C) 2015,2016 Cavium Inc.
> > + * Author: Jan Glauber <jan.glauber@cavium.com>
> > + */
> > +
> > +#include <linux/cpufeature.h>
> > +#include <linux/numa.h>
> > +#include <linux/slab.h>
> > +
> > +#include "uncore_cavium.h"
> > +
> > +/*
> > + * Some notes about the various counters supported by this "uncore" PMU
> > + * and the design:
> > + *
> > + * All counters are 64 bit long.
> > + * There are no overflow interrupts.
> > + * Counters are summarized per node/socket.
> > + * Most devices appear as separate PCI devices per socket with the exception
> > + * of OCX TLK which appears as one PCI device per socket and contains several
> > + * units with counters that are merged.
> > + * Some counters are selected via a control register (L2C TAD) and read by
> > + * a number of counter registers, others (L2C CBC, LMC & OCX TLK) have
> > + * one dedicated counter per event.
> > + * Some counters are not stoppable (L2C CBC & LMC).
> > + * Some counters are read-only (LMC).
> > + * All counters belong to PCI devices, the devices may have additional
> > + * drivers but we assume we are the only user of the counter registers.
> > + * We map the whole PCI BAR so we must be careful to forbid access to
> > + * addresses that contain neither counters nor counter control registers.
> > + */
> > +
> > +void thunder_uncore_read(struct perf_event *event)
> > +{
> 
> Rather than have a bunch of global symbols that are called from the
> individual drivers, why don't you pass a struct of function pointers to
> their respective init functions and keep the internals private?

Most of these functions are already in struct pmu. What I can do is
set the shared functions in thunder_uncore_setup as default, and
only override them as needed (like thunder_uncore_read_ocx_tlk)
or the other way around (use default if not set already).
Then I can get rid of them.

> > +	struct thunder_uncore *uncore = to_uncore(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	struct thunder_uncore_unit *unit;
> > +	u64 prev, delta, new = 0;
> > +
> > +	node = get_node(hwc->config, uncore);
> > +
> > +	/* read counter values from all units on the node */
> > +	list_for_each_entry(unit, &node->unit_list, entry)
> > +		new += readq(hwc->event_base + unit->map);
> > +
> > +	prev = local64_read(&hwc->prev_count);
> > +	local64_set(&hwc->prev_count, new);
> > +	delta = new - prev;
> > +	local64_add(delta, &event->count);
> > +}
> > +
> > +int thunder_uncore_add(struct perf_event *event, int flags, u64 config_base,
> > +		       u64 event_base)
> > +{
> > +	struct thunder_uncore *uncore = to_uncore(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	int id;
> > +
> > +	node = get_node(hwc->config, uncore);
> > +	id = get_id(hwc->config);
> > +
> > +	if (!cmpxchg(&node->events[id], NULL, event))
> > +		hwc->idx = id;
> 
> Does this need to be a full-fat cmpxchg? Who are you racing with?

Just copy'n'paste from the existing drivers. I guess it can be relaxed.

> > +
> > +	if (hwc->idx == -1)
> > +		return -EBUSY;
> 
> This would be much clearer as an else statement after the cmpxchg.

Agreed.

> > +
> > +	hwc->config_base = config_base;
> > +	hwc->event_base = event_base;
> > +	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> > +
> > +	if (flags & PERF_EF_START)
> > +		uncore->pmu.start(event, PERF_EF_RELOAD);
> > +
> > +	return 0;
> > +}
> > +
> > +void thunder_uncore_del(struct perf_event *event, int flags)
> > +{
> > +	struct thunder_uncore *uncore = to_uncore(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	int i;
> > +
> > +	event->pmu->stop(event, PERF_EF_UPDATE);
> > +
> > +	/*
> > +	 * For programmable counters we need to check where we installed it.
> > +	 * To keep this function generic always test the more complicated
> > +	 * case (free running counters won't need the loop).
> > +	 */
> > +	node = get_node(hwc->config, uncore);
> > +	for (i = 0; i < node->num_counters; i++) {
> > +		if (cmpxchg(&node->events[i], event, NULL) == event)
> > +			break;
> > +	}
> > +	hwc->idx = -1;
> > +}
> > +
> > +void thunder_uncore_start(struct perf_event *event, int flags)
> > +{
> > +	struct thunder_uncore *uncore = to_uncore(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	struct thunder_uncore_unit *unit;
> > +	u64 new = 0;
> > +
> > +	/* read counter values from all units on the node */
> > +	node = get_node(hwc->config, uncore);
> > +	list_for_each_entry(unit, &node->unit_list, entry)
> > +		new += readq(hwc->event_base + unit->map);
> > +	local64_set(&hwc->prev_count, new);
> > +
> > +	hwc->state = 0;
> > +	perf_event_update_userpage(event);
> > +}
> > +
> > +void thunder_uncore_stop(struct perf_event *event, int flags)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +
> > +	hwc->state |= PERF_HES_STOPPED;
> > +
> > +	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> > +		thunder_uncore_read(event);
> > +		hwc->state |= PERF_HES_UPTODATE;
> > +	}
> > +}
> > +
> > +int thunder_uncore_event_init(struct perf_event *event)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	struct thunder_uncore *uncore;
> > +
> > +	if (event->attr.type != event->pmu->type)
> > +		return -ENOENT;
> > +
> > +	/* we do not support sampling */
> > +	if (is_sampling_event(event))
> > +		return -EINVAL;
> > +
> > +	/* counters do not have these bits */
> > +	if (event->attr.exclude_user	||
> > +	    event->attr.exclude_kernel	||
> > +	    event->attr.exclude_host	||
> > +	    event->attr.exclude_guest	||
> > +	    event->attr.exclude_hv	||
> > +	    event->attr.exclude_idle)
> > +		return -EINVAL;
> > +
> > +	uncore = to_uncore(event->pmu);
> > +	if (!uncore)
> > +		return -ENODEV;
> > +	if (!uncore->event_valid(event->attr.config & UNCORE_EVENT_ID_MASK))
> > +		return -EINVAL;
> > +
> > +	/* check NUMA node */
> > +	node = get_node(event->attr.config, uncore);
> > +	if (!node) {
> > +		pr_debug("Invalid NUMA node selected\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	hwc->config = event->attr.config;
> > +	hwc->idx = -1;
> > +	return 0;
> > +}
> > +
> > +static ssize_t thunder_uncore_attr_show_cpumask(struct device *dev,
> > +						struct device_attribute *attr,
> > +						char *buf)
> > +{
> > +	struct pmu *pmu = dev_get_drvdata(dev);
> > +	struct thunder_uncore *uncore =
> > +		container_of(pmu, struct thunder_uncore, pmu);
> > +
> > +	return cpumap_print_to_pagebuf(true, buf, &uncore->active_mask);
> > +}
> > +static DEVICE_ATTR(cpumask, S_IRUGO, thunder_uncore_attr_show_cpumask, NULL);
> > +
> > +static struct attribute *thunder_uncore_attrs[] = {
> > +	&dev_attr_cpumask.attr,
> > +	NULL,
> > +};
> > +
> > +struct attribute_group thunder_uncore_attr_group = {
> > +	.attrs = thunder_uncore_attrs,
> > +};
> > +
> > +ssize_t thunder_events_sysfs_show(struct device *dev,
> > +				  struct device_attribute *attr,
> > +				  char *page)
> > +{
> > +	struct perf_pmu_events_attr *pmu_attr =
> > +		container_of(attr, struct perf_pmu_events_attr, attr);
> > +
> > +	if (pmu_attr->event_str)
> > +		return sprintf(page, "%s", pmu_attr->event_str);
> > +
> > +	return 0;
> > +}
> > +
> > +/* node attribute depending on number of NUMA nodes */
> > +static ssize_t node_show(struct device *dev, struct device_attribute *attr,
> > +			 char *page)
> > +{
> > +	if (NODES_SHIFT)
> > +		return sprintf(page, "config:16-%d\n", 16 + NODES_SHIFT - 1);
> 
> If NODES_SHIFT is 1, you'll end up with "config:16-16", which might confuse
> userspace.

So should I use "config:16" in that case? Is it OK to use this also for
NODES_SHIFT=0 ?

> > +	else
> > +		return sprintf(page, "config:16\n");
> > +}
> > +
> > +struct device_attribute format_attr_node = __ATTR_RO(node);
> > +
> > +/*
> > + * Thunder uncore events are independent from CPUs. Provide a cpumask
> > + * nevertheless to prevent perf from adding the event per-cpu and just
> > + * set the mask to one online CPU. Use the same cpumask for all uncore
> > + * devices.
> > + *
> > + * There is a performance penalty for accessing a device from a CPU on
> > + * another socket, but we do not care (yet).
> > + */
> > +static int thunder_uncore_offline_cpu(unsigned int old_cpu, struct hlist_node *node)
> > +{
> > +	struct thunder_uncore *uncore = hlist_entry_safe(node, struct thunder_uncore, node);
> 
> Why _safe?

Not required, will remove.

> > +	int new_cpu;
> > +
> > +	if (!cpumask_test_and_clear_cpu(old_cpu, &uncore->active_mask))
> > +		return 0;
> > +	new_cpu = cpumask_any_but(cpu_online_mask, old_cpu);
> > +	if (new_cpu >= nr_cpu_ids)
> > +		return 0;
> > +	perf_pmu_migrate_context(&uncore->pmu, old_cpu, new_cpu);
> > +	cpumask_set_cpu(new_cpu, &uncore->active_mask);
> > +	return 0;
> > +}
> > +
> > +static struct thunder_uncore_node * __init alloc_node(struct thunder_uncore *uncore,
> > +						      int node_id, int counters)
> > +{
> > +	struct thunder_uncore_node *node;
> > +
> > +	node = kzalloc(sizeof(*node), GFP_KERNEL);
> > +	if (!node)
> > +		return NULL;
> > +	node->num_counters = counters;
> > +	INIT_LIST_HEAD(&node->unit_list);
> > +	return node;
> > +}
> > +
> > +int __init thunder_uncore_setup(struct thunder_uncore *uncore, int device_id,
> > +				struct pmu *pmu, int counters)
> > +{
> > +	unsigned int vendor_id = PCI_VENDOR_ID_CAVIUM;
> > +	struct thunder_uncore_unit  *unit, *tmp;
> > +	struct thunder_uncore_node *node;
> > +	struct pci_dev *pdev = NULL;
> > +	int ret, node_id, found = 0;
> > +
> > +	/* detect PCI devices */
> > +	while ((pdev = pci_get_device(vendor_id, device_id, pdev))) {
> 
> Redundant brackets?

OK

> > +		if (!pdev)
> > +			break;
> 
> Redundant check?

OK

> > +		node_id = dev_to_node(&pdev->dev);
> > +
> > +		/* allocate node if necessary */
> > +		if (!uncore->nodes[node_id])
> > +			uncore->nodes[node_id] = alloc_node(uncore, node_id, counters);
> > +
> > +		node = uncore->nodes[node_id];
> > +		if (!node) {
> > +			ret = -ENOMEM;
> > +			goto fail;
> > +		}
> > +
> > +		unit = kzalloc(sizeof(*unit), GFP_KERNEL);
> > +		if (!unit) {
> > +			ret = -ENOMEM;
> > +			goto fail;
> > +		}
> > +
> > +		unit->pdev = pdev;
> > +		unit->map = ioremap(pci_resource_start(pdev, 0),
> > +				    pci_resource_len(pdev, 0));
> > +		list_add(&unit->entry, &node->unit_list);
> > +		node->nr_units++;
> > +		found++;
> > +	}
> > +
> > +	if (!found)
> > +		return -ENODEV;
> > +
> > +	cpuhp_state_add_instance_nocalls(CPUHP_AP_UNCORE_CAVIUM_ONLINE,
> > +                                         &uncore->node);
> > +
> > +	/*
> > +	 * perf PMU is CPU dependent in difference to our uncore devices.
> > +	 * Just pick a CPU and migrate away if it goes offline.
> > +	 */
> > +	cpumask_set_cpu(smp_processor_id(), &uncore->active_mask);
> > +
> > +	uncore->pmu = *pmu;
> > +	ret = perf_pmu_register(&uncore->pmu, uncore->pmu.name, -1);
> > +	if (ret)
> > +		goto fail;
> > +
> > +	return 0;
> > +
> > +fail:
> > +	node_id = 0;
> > +	while (uncore->nodes[node_id]) {
> > +		node = uncore->nodes[node_id];
> > +
> > +		list_for_each_entry_safe(unit, tmp, &node->unit_list, entry) {
> 
> Why do you need the _safe variant?

OK, not needed

> Will

WARNING: multiple messages have this Message-ID (diff)
From: jan.glauber@caviumnetworks.com (Jan Glauber)
To: linux-arm-kernel@lists.infradead.org
Subject: [PATCH v4 1/5] arm64: perf: Basic uncore counter support for Cavium ThunderX SOC
Date: Fri, 11 Nov 2016 11:30:29 +0100	[thread overview]
Message-ID: <20161111103029.GD16907@hardcore> (raw)
In-Reply-To: <20161108235010.GC17771@arm.com>

Hi Will,

thanks for the review!

On Tue, Nov 08, 2016 at 11:50:10PM +0000, Will Deacon wrote:
> Hi Jan,
> 
> Thanks for posting an updated series. I have a few minor comments, which
> we can hopefully address in time for 4.10.
> 
> Also, have you run the perf fuzzer with this series applied?

No, that's new to me. Will try it.

> https://github.com/deater/perf_event_tests
> 
> (build the tests and look under the fuzzer/ directory for the tool)
> 
> On Sat, Oct 29, 2016 at 01:55:29PM +0200, Jan Glauber wrote:
> > Provide "uncore" facilities for different non-CPU performance
> > counter units.
> > 
> > The uncore PMUs can be found under /sys/bus/event_source/devices.
> > All counters are exported via sysfs in the corresponding events
> > files under the PMU directory so the perf tool can list the event names.
> > 
> > There are some points that are special in this implementation:
> > 
> > 1) The PMU detection relies on PCI device detection. If a
> >    matching PCI device is found the PMU is created. The code can deal
> >    with multiple units of the same type, e.g. more than one memory
> >    controller.
> > 
> > 2) Counters are summarized across different units of the same type
> >    on one NUMA node but not across NUMA nodes.
> >    For instance L2C TAD 0..7 are presented as a single counter
> >    (adding the values from TAD 0 to 7). Although losing the ability
> >    to read a single value the merged values are easier to use.
> > 
> > 3) The counters are not CPU related. A random CPU is picked regardless
> >    of the NUMA node. There is a small performance penalty for accessing
> >    counters on a remote note but reading a performance counter is a
> >    slow operation anyway.
> > 
> > Signed-off-by: Jan Glauber <jglauber@cavium.com>
> > ---
> >  drivers/perf/Kconfig                |  13 ++
> >  drivers/perf/Makefile               |   1 +
> >  drivers/perf/uncore/Makefile        |   1 +
> >  drivers/perf/uncore/uncore_cavium.c | 351 ++++++++++++++++++++++++++++++++++++
> >  drivers/perf/uncore/uncore_cavium.h |  71 ++++++++
> 
> We already have "uncore" PMUs under drivers/perf, so I'd prefer that we
> renamed this a bit to reflect better what's going on. How about:
> 
>   drivers/perf/cavium/
> 
> and then
> 
>   drivers/perf/cavium/uncore_thunder.[ch]
> 
> ?

OK, agreed.

> >  include/linux/cpuhotplug.h          |   1 +
> >  6 files changed, 438 insertions(+)
> >  create mode 100644 drivers/perf/uncore/Makefile
> >  create mode 100644 drivers/perf/uncore/uncore_cavium.c
> >  create mode 100644 drivers/perf/uncore/uncore_cavium.h
> > 
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 4d5c5f9..3266c87 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -19,4 +19,17 @@ config XGENE_PMU
> >          help
> >            Say y if you want to use APM X-Gene SoC performance monitors.
> >  
> > +config UNCORE_PMU
> > +	bool
> 
> This isn't needed.

I thought about a symbol for all uncore pmus. But when drivers/perf is
already that it can be removed.

> > +
> > +config UNCORE_PMU_CAVIUM
> > +	depends on PERF_EVENTS && NUMA && ARM64
> > +	bool "Cavium uncore PMU support"
> 
> Please mentioned Thunder somewhere, since that's the SoC being supported.

OK.

> > +	select UNCORE_PMU
> > +	default y
> > +	help
> > +	  Say y if you want to access performance counters of subsystems
> > +	  on a Cavium SOC like cache controller, memory controller or
> > +	  processor interconnect.
> > +
> >  endmenu
> > diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> > index b116e98..d6c02c9 100644
> > --- a/drivers/perf/Makefile
> > +++ b/drivers/perf/Makefile
> > @@ -1,2 +1,3 @@
> >  obj-$(CONFIG_ARM_PMU) += arm_pmu.o
> >  obj-$(CONFIG_XGENE_PMU) += xgene_pmu.o
> > +obj-y += uncore/
> > diff --git a/drivers/perf/uncore/Makefile b/drivers/perf/uncore/Makefile
> > new file mode 100644
> > index 0000000..6130e18
> > --- /dev/null
> > +++ b/drivers/perf/uncore/Makefile
> > @@ -0,0 +1 @@
> > +obj-$(CONFIG_UNCORE_PMU_CAVIUM) += uncore_cavium.o
> > diff --git a/drivers/perf/uncore/uncore_cavium.c b/drivers/perf/uncore/uncore_cavium.c
> > new file mode 100644
> > index 0000000..a7b4277
> > --- /dev/null
> > +++ b/drivers/perf/uncore/uncore_cavium.c
> > @@ -0,0 +1,351 @@
> > +/*
> > + * Cavium Thunder uncore PMU support.
> > + *
> > + * Copyright (C) 2015,2016 Cavium Inc.
> > + * Author: Jan Glauber <jan.glauber@cavium.com>
> > + */
> > +
> > +#include <linux/cpufeature.h>
> > +#include <linux/numa.h>
> > +#include <linux/slab.h>
> > +
> > +#include "uncore_cavium.h"
> > +
> > +/*
> > + * Some notes about the various counters supported by this "uncore" PMU
> > + * and the design:
> > + *
> > + * All counters are 64 bit long.
> > + * There are no overflow interrupts.
> > + * Counters are summarized per node/socket.
> > + * Most devices appear as separate PCI devices per socket with the exception
> > + * of OCX TLK which appears as one PCI device per socket and contains several
> > + * units with counters that are merged.
> > + * Some counters are selected via a control register (L2C TAD) and read by
> > + * a number of counter registers, others (L2C CBC, LMC & OCX TLK) have
> > + * one dedicated counter per event.
> > + * Some counters are not stoppable (L2C CBC & LMC).
> > + * Some counters are read-only (LMC).
> > + * All counters belong to PCI devices, the devices may have additional
> > + * drivers but we assume we are the only user of the counter registers.
> > + * We map the whole PCI BAR so we must be careful to forbid access to
> > + * addresses that contain neither counters nor counter control registers.
> > + */
> > +
> > +void thunder_uncore_read(struct perf_event *event)
> > +{
> 
> Rather than have a bunch of global symbols that are called from the
> individual drivers, why don't you pass a struct of function pointers to
> their respective init functions and keep the internals private?

Most of these functions are already in struct pmu. What I can do is
set the shared functions in thunder_uncore_setup as default, and
only override them as needed (like thunder_uncore_read_ocx_tlk)
or the other way around (use default if not set already).
Then I can get rid of them.

> > +	struct thunder_uncore *uncore = to_uncore(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	struct thunder_uncore_unit *unit;
> > +	u64 prev, delta, new = 0;
> > +
> > +	node = get_node(hwc->config, uncore);
> > +
> > +	/* read counter values from all units on the node */
> > +	list_for_each_entry(unit, &node->unit_list, entry)
> > +		new += readq(hwc->event_base + unit->map);
> > +
> > +	prev = local64_read(&hwc->prev_count);
> > +	local64_set(&hwc->prev_count, new);
> > +	delta = new - prev;
> > +	local64_add(delta, &event->count);
> > +}
> > +
> > +int thunder_uncore_add(struct perf_event *event, int flags, u64 config_base,
> > +		       u64 event_base)
> > +{
> > +	struct thunder_uncore *uncore = to_uncore(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	int id;
> > +
> > +	node = get_node(hwc->config, uncore);
> > +	id = get_id(hwc->config);
> > +
> > +	if (!cmpxchg(&node->events[id], NULL, event))
> > +		hwc->idx = id;
> 
> Does this need to be a full-fat cmpxchg? Who are you racing with?

Just copy'n'paste from the existing drivers. I guess it can be relaxed.

> > +
> > +	if (hwc->idx == -1)
> > +		return -EBUSY;
> 
> This would be much clearer as an else statement after the cmpxchg.

Agreed.

> > +
> > +	hwc->config_base = config_base;
> > +	hwc->event_base = event_base;
> > +	hwc->state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
> > +
> > +	if (flags & PERF_EF_START)
> > +		uncore->pmu.start(event, PERF_EF_RELOAD);
> > +
> > +	return 0;
> > +}
> > +
> > +void thunder_uncore_del(struct perf_event *event, int flags)
> > +{
> > +	struct thunder_uncore *uncore = to_uncore(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	int i;
> > +
> > +	event->pmu->stop(event, PERF_EF_UPDATE);
> > +
> > +	/*
> > +	 * For programmable counters we need to check where we installed it.
> > +	 * To keep this function generic always test the more complicated
> > +	 * case (free running counters won't need the loop).
> > +	 */
> > +	node = get_node(hwc->config, uncore);
> > +	for (i = 0; i < node->num_counters; i++) {
> > +		if (cmpxchg(&node->events[i], event, NULL) == event)
> > +			break;
> > +	}
> > +	hwc->idx = -1;
> > +}
> > +
> > +void thunder_uncore_start(struct perf_event *event, int flags)
> > +{
> > +	struct thunder_uncore *uncore = to_uncore(event->pmu);
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	struct thunder_uncore_unit *unit;
> > +	u64 new = 0;
> > +
> > +	/* read counter values from all units on the node */
> > +	node = get_node(hwc->config, uncore);
> > +	list_for_each_entry(unit, &node->unit_list, entry)
> > +		new += readq(hwc->event_base + unit->map);
> > +	local64_set(&hwc->prev_count, new);
> > +
> > +	hwc->state = 0;
> > +	perf_event_update_userpage(event);
> > +}
> > +
> > +void thunder_uncore_stop(struct perf_event *event, int flags)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +
> > +	hwc->state |= PERF_HES_STOPPED;
> > +
> > +	if ((flags & PERF_EF_UPDATE) && !(hwc->state & PERF_HES_UPTODATE)) {
> > +		thunder_uncore_read(event);
> > +		hwc->state |= PERF_HES_UPTODATE;
> > +	}
> > +}
> > +
> > +int thunder_uncore_event_init(struct perf_event *event)
> > +{
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	struct thunder_uncore_node *node;
> > +	struct thunder_uncore *uncore;
> > +
> > +	if (event->attr.type != event->pmu->type)
> > +		return -ENOENT;
> > +
> > +	/* we do not support sampling */
> > +	if (is_sampling_event(event))
> > +		return -EINVAL;
> > +
> > +	/* counters do not have these bits */
> > +	if (event->attr.exclude_user	||
> > +	    event->attr.exclude_kernel	||
> > +	    event->attr.exclude_host	||
> > +	    event->attr.exclude_guest	||
> > +	    event->attr.exclude_hv	||
> > +	    event->attr.exclude_idle)
> > +		return -EINVAL;
> > +
> > +	uncore = to_uncore(event->pmu);
> > +	if (!uncore)
> > +		return -ENODEV;
> > +	if (!uncore->event_valid(event->attr.config & UNCORE_EVENT_ID_MASK))
> > +		return -EINVAL;
> > +
> > +	/* check NUMA node */
> > +	node = get_node(event->attr.config, uncore);
> > +	if (!node) {
> > +		pr_debug("Invalid NUMA node selected\n");
> > +		return -EINVAL;
> > +	}
> > +
> > +	hwc->config = event->attr.config;
> > +	hwc->idx = -1;
> > +	return 0;
> > +}
> > +
> > +static ssize_t thunder_uncore_attr_show_cpumask(struct device *dev,
> > +						struct device_attribute *attr,
> > +						char *buf)
> > +{
> > +	struct pmu *pmu = dev_get_drvdata(dev);
> > +	struct thunder_uncore *uncore =
> > +		container_of(pmu, struct thunder_uncore, pmu);
> > +
> > +	return cpumap_print_to_pagebuf(true, buf, &uncore->active_mask);
> > +}
> > +static DEVICE_ATTR(cpumask, S_IRUGO, thunder_uncore_attr_show_cpumask, NULL);
> > +
> > +static struct attribute *thunder_uncore_attrs[] = {
> > +	&dev_attr_cpumask.attr,
> > +	NULL,
> > +};
> > +
> > +struct attribute_group thunder_uncore_attr_group = {
> > +	.attrs = thunder_uncore_attrs,
> > +};
> > +
> > +ssize_t thunder_events_sysfs_show(struct device *dev,
> > +				  struct device_attribute *attr,
> > +				  char *page)
> > +{
> > +	struct perf_pmu_events_attr *pmu_attr =
> > +		container_of(attr, struct perf_pmu_events_attr, attr);
> > +
> > +	if (pmu_attr->event_str)
> > +		return sprintf(page, "%s", pmu_attr->event_str);
> > +
> > +	return 0;
> > +}
> > +
> > +/* node attribute depending on number of NUMA nodes */
> > +static ssize_t node_show(struct device *dev, struct device_attribute *attr,
> > +			 char *page)
> > +{
> > +	if (NODES_SHIFT)
> > +		return sprintf(page, "config:16-%d\n", 16 + NODES_SHIFT - 1);
> 
> If NODES_SHIFT is 1, you'll end up with "config:16-16", which might confuse
> userspace.

So should I use "config:16" in that case? Is it OK to use this also for
NODES_SHIFT=0 ?

> > +	else
> > +		return sprintf(page, "config:16\n");
> > +}
> > +
> > +struct device_attribute format_attr_node = __ATTR_RO(node);
> > +
> > +/*
> > + * Thunder uncore events are independent from CPUs. Provide a cpumask
> > + * nevertheless to prevent perf from adding the event per-cpu and just
> > + * set the mask to one online CPU. Use the same cpumask for all uncore
> > + * devices.
> > + *
> > + * There is a performance penalty for accessing a device from a CPU on
> > + * another socket, but we do not care (yet).
> > + */
> > +static int thunder_uncore_offline_cpu(unsigned int old_cpu, struct hlist_node *node)
> > +{
> > +	struct thunder_uncore *uncore = hlist_entry_safe(node, struct thunder_uncore, node);
> 
> Why _safe?

Not required, will remove.

> > +	int new_cpu;
> > +
> > +	if (!cpumask_test_and_clear_cpu(old_cpu, &uncore->active_mask))
> > +		return 0;
> > +	new_cpu = cpumask_any_but(cpu_online_mask, old_cpu);
> > +	if (new_cpu >= nr_cpu_ids)
> > +		return 0;
> > +	perf_pmu_migrate_context(&uncore->pmu, old_cpu, new_cpu);
> > +	cpumask_set_cpu(new_cpu, &uncore->active_mask);
> > +	return 0;
> > +}
> > +
> > +static struct thunder_uncore_node * __init alloc_node(struct thunder_uncore *uncore,
> > +						      int node_id, int counters)
> > +{
> > +	struct thunder_uncore_node *node;
> > +
> > +	node = kzalloc(sizeof(*node), GFP_KERNEL);
> > +	if (!node)
> > +		return NULL;
> > +	node->num_counters = counters;
> > +	INIT_LIST_HEAD(&node->unit_list);
> > +	return node;
> > +}
> > +
> > +int __init thunder_uncore_setup(struct thunder_uncore *uncore, int device_id,
> > +				struct pmu *pmu, int counters)
> > +{
> > +	unsigned int vendor_id = PCI_VENDOR_ID_CAVIUM;
> > +	struct thunder_uncore_unit  *unit, *tmp;
> > +	struct thunder_uncore_node *node;
> > +	struct pci_dev *pdev = NULL;
> > +	int ret, node_id, found = 0;
> > +
> > +	/* detect PCI devices */
> > +	while ((pdev = pci_get_device(vendor_id, device_id, pdev))) {
> 
> Redundant brackets?

OK

> > +		if (!pdev)
> > +			break;
> 
> Redundant check?

OK

> > +		node_id = dev_to_node(&pdev->dev);
> > +
> > +		/* allocate node if necessary */
> > +		if (!uncore->nodes[node_id])
> > +			uncore->nodes[node_id] = alloc_node(uncore, node_id, counters);
> > +
> > +		node = uncore->nodes[node_id];
> > +		if (!node) {
> > +			ret = -ENOMEM;
> > +			goto fail;
> > +		}
> > +
> > +		unit = kzalloc(sizeof(*unit), GFP_KERNEL);
> > +		if (!unit) {
> > +			ret = -ENOMEM;
> > +			goto fail;
> > +		}
> > +
> > +		unit->pdev = pdev;
> > +		unit->map = ioremap(pci_resource_start(pdev, 0),
> > +				    pci_resource_len(pdev, 0));
> > +		list_add(&unit->entry, &node->unit_list);
> > +		node->nr_units++;
> > +		found++;
> > +	}
> > +
> > +	if (!found)
> > +		return -ENODEV;
> > +
> > +	cpuhp_state_add_instance_nocalls(CPUHP_AP_UNCORE_CAVIUM_ONLINE,
> > +                                         &uncore->node);
> > +
> > +	/*
> > +	 * perf PMU is CPU dependent in difference to our uncore devices.
> > +	 * Just pick a CPU and migrate away if it goes offline.
> > +	 */
> > +	cpumask_set_cpu(smp_processor_id(), &uncore->active_mask);
> > +
> > +	uncore->pmu = *pmu;
> > +	ret = perf_pmu_register(&uncore->pmu, uncore->pmu.name, -1);
> > +	if (ret)
> > +		goto fail;
> > +
> > +	return 0;
> > +
> > +fail:
> > +	node_id = 0;
> > +	while (uncore->nodes[node_id]) {
> > +		node = uncore->nodes[node_id];
> > +
> > +		list_for_each_entry_safe(unit, tmp, &node->unit_list, entry) {
> 
> Why do you need the _safe variant?

OK, not needed

> Will

  reply	other threads:[~2016-11-11 10:46 UTC|newest]

Thread overview: 28+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-10-29 11:55 [PATCH v4 0/5] Cavium ThunderX uncore PMU support Jan Glauber
2016-10-29 11:55 ` Jan Glauber
2016-10-29 11:55 ` [PATCH v4 1/5] arm64: perf: Basic uncore counter support for Cavium ThunderX SOC Jan Glauber
2016-10-29 11:55   ` Jan Glauber
2016-11-08 23:50   ` Will Deacon
2016-11-08 23:50     ` Will Deacon
2016-11-11 10:30     ` Jan Glauber [this message]
2016-11-11 10:30       ` Jan Glauber
2016-11-17 18:10       ` Will Deacon
2016-11-17 18:10         ` Will Deacon
2016-11-10 16:54   ` Mark Rutland
2016-11-10 16:54     ` Mark Rutland
2016-11-10 19:46     ` Will Deacon
2016-11-10 19:46       ` Will Deacon
2016-11-11  7:37     ` Jan Glauber
2016-11-11  7:37       ` Jan Glauber
2016-11-11 10:39     ` Jan Glauber
2016-11-11 10:39       ` Jan Glauber
2016-11-11 11:18       ` Mark Rutland
2016-11-11 11:18         ` Mark Rutland
2016-10-29 11:55 ` [PATCH v4 2/5] arm64: perf: Cavium ThunderX L2C TAD uncore support Jan Glauber
2016-10-29 11:55   ` Jan Glauber
2016-10-29 11:55 ` [PATCH v4 3/5] arm64: perf: Cavium ThunderX L2C CBC " Jan Glauber
2016-10-29 11:55   ` Jan Glauber
2016-10-29 11:55 ` [PATCH v4 4/5] arm64: perf: Cavium ThunderX LMC " Jan Glauber
2016-10-29 11:55   ` Jan Glauber
2016-10-29 11:55 ` [PATCH v4 5/5] arm64: perf: Cavium ThunderX OCX TLK " Jan Glauber
2016-10-29 11:55   ` Jan Glauber

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20161111103029.GD16907@hardcore \
    --to=jan.glauber@caviumnetworks.com \
    --cc=linux-arm-kernel@lists.infradead.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=mark.rutland@arm.com \
    --cc=will.deacon@arm.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.