linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v8 00/10] IMC Instrumentation Support
@ 2017-05-04 14:19 Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
                   ` (9 more replies)
  0 siblings, 10 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Power9 has In-Memory-Collection (IMC) infrastructure which contains
various Performance Monitoring Units (PMUs) at Nest level (these are
on-chip but off-core), Core level and Thread level.

The Nest PMU counters are handled by a Nest IMC microcode which runs
in the OCC (On-Chip Controller) complex. The microcode collects the
counter data and moves the nest IMC counter data to memory.

The Core and Thread IMC PMU counters are handled in the core. Core
level PMU counters give us the IMC counters' data per core and thread
level PMU counters give us the IMC counters' data per CPU thread.

This patchset enables the nest IMC, core IMC and thread IMC
PMUs and is based on the initial work done by Madhavan Srinivasan.
"Nest Instrumentation Support" :
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132078.html

v1 for this patchset can be found here :
https://lwn.net/Articles/705475/

Nest events:
Per-chip nest instrumentation provides various per-chip metrics
such as memory, powerbus, Xlink and Alink bandwidth.

Core events:
Per-core IMC instrumentation provides various per-core metrics
such as non-idle cycles, non-idle instructions, various cache and
memory related metrics etc.

Thread events:
All the events for thread level are same as core level with the
difference being in the domain. These are per-cpu metrics.

PMU Events' Information:
OPAL obtains the IMC PMU and event information from the IMC Catalog
and passes on to the kernel via the device tree. The events' information
contains :
 - Event name
 - Event Offset
 - Event description
and, maybe :
 - Event scale
 - Event unit

Some PMUs may have a common scale and unit values for all their
supported events. For those cases, the scale and unit properties for
those events must be inherited from the PMU.

The event offset in the memory is where the counter data gets
accumulated.

The OPAL-side patches are posted upstream :
https://lists.ozlabs.org/pipermail/skiboot/2017-May/007167.html

The kernel discovers the IMC counters information in the device tree
at the "imc-counters" device node which has a compatible field
"ibm,opal-in-memory-counters".

Parsing of the Events' information:
To parse the IMC PMUs and events information, the kernel has to
discover the "imc-counters" node and walk through the pmu and event
nodes.

Here is an excerpt of the dt showing the imc-counters with
mcs0 (nest), core and thread node:
https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts


/dts-v1/;

[...]

/dts-v1/;

/ {
        name = "";
        compatible = "ibm,opal-in-memory-counters";
        #address-cells = <0x1>;
        #size-cells = <0x1>;
        imc-nest-offset = <0x320000>;
        imc-nest-size = <0x30000>;
        version-id = "";

        NEST_MCS: nest-mcs-events {
                #address-cells = <0x1>;
                #size-cells = <0x1>;

                event@0 {
                        event-name = "RRTO_QFULL_NO_DISP" ;
                        reg = <0x0 0x8>;
                        desc = "RRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid RRTO op is not dispatched due to a command list full condition" ;
                };
                event@8 {
                        event-name = "WRTO_QFULL_NO_DISP" ;
                        reg = <0x8 0x8>;
                        desc = "WRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid WRTO op is not dispatched due to a command list full condition" ;
                };
		[...]
	mcs0 {
                compatible = "ibm,imc-counters-nest";
                events-prefix = "PM_MCS0_";
                unit = "";
                scale = "";
                reg = <0x118 0x8>;
                events = < &NEST_MCS >;
        };

        mcs1 {
                compatible = "ibm,imc-counters-nest";
                events-prefix = "PM_MCS1_";
                unit = "";
                scale = "";
                reg = <0x198 0x8>;
                events = < &NEST_MCS >;
        };
	[...]
CORE_EVENTS: core-events {
                #address-cells = <0x1>;
                #size-cells = <0x1>;

                event@e0 {
                        event-name = "0THRD_NON_IDLE_PCYC" ;
                        reg = <0xe0 0x8>;
                        desc = "The number of processor cycles when all threads are idle" ;
                };
                event@120 {
                        event-name = "1THRD_NON_IDLE_PCYC" ;
                        reg = <0x120 0x8>;
                        desc = "The number of processor cycles when exactly one SMT thread is executing non-idle code" ;
                };
		[...]
       core {
                compatible = "ibm,imc-counters-core";
                events-prefix = "CPM_";
                unit = "";
                scale = "";
                reg = <0x0 0x8>;
                events = < &CORE_EVENTS >;
        };

        thread {
                compatible = "ibm,imc-counters-core";
                events-prefix = "CPM_";
                unit = "";
                scale = "";
                reg = <0x0 0x8>;
                events = < &CORE_EVENTS >;
        };
};

>From the device tree, the kernel parses the PMUs and their events'
information.

After parsing the IMC PMUs and their events, the PMUs and their
attributes are registered in the kernel.

This patchset (patches 9 and 10) configure the thread level IMC PMUs
to count for tasks, which give us the thread level metric values per
task.

Example Usage :
 # perf list

  [...]
  nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/           [Kernel PMU event]
  nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0_LAST_SAMPLE/ [Kernel PMU event]
  [...]
  core_imc/CPM_NON_IDLE_INST/                        [Kernel PMU event]
  core_imc/CPM_NON_IDLE_PCYC/                        [Kernel PMU event]
  [...]
  thread_imc/CPM_NON_IDLE_INST/                      [Kernel PMU event]
  thread_imc/CPM_NON_IDLE_PCYC/                      [Kernel PMU event]

To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/ :
 # perf stat -e "nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/" -a --per-socket

To see non-idle instructions for core 0 :
 # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000

To see non-idle instructions for a "make" :
 # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make

Comments/feedback/suggestions are welcome.


TODO:
1)Add a sysfs interface to disable the Core imc (both for ldbar and pdbar)


Changelog:

v7 -> v8:
 - opal-call API for nest and core is changed.
   OPAL_NEST_IMC_COUNTERS_CONTROL and
   OPAL_CORE_IMC_COUNTERS_CONTROL  is replaced with
   OPAL_IMC_COUNTERS_INIT, OPAL_IMC_COUNTERS_START and
   OPAL_IMC_COUNTERS_STOP.
 - thread_ima doesn't have CPUMASK_ATTR, hence added a
   fix in patch 09/10, which will swap the IMC_EVENT_ATTR
   slot with IMC_CPUMASK_ATTR.

v6 -> v7:
 - Updated the commit message and code comments.
 - Changed the counter init code to disable the
   nest/core counters by default and enable only 
   when it is used.
 - Updated the pmu-setup code to register the
   PMUs which doesn't have events.
 - replaced imc_event_info_val() to imc_event_prop_update()
 - Updated the imc_pmu_setup() code, by checking for the "value"
   of compatible property instead of merely checking for compatible.
 - removed imc_get_domain().
 - init_imc_pmu() and imc_pmu_setup() are made  __init.
 - update_max_val() is invoked immediately after updating the offset value.
v5 -> v6:
 - merged few patches for the readability and code flow
 - Updated the commit message and code comments.
 - updated cpuhotplug code and added checks for perf migration context
 - Added READ_ONCE() when reading the counter data.
 - replaced of_property_read_u32() with of_get_address() for "reg" property read
 - replaced UNKNOWN_DOMAIN with IMC_DOMAIN_UNKNOWN
 v4 -> v5:
 - Updated opal call numbers
 - Added a patch to disable Core-IMC device using shutdown callback
 - Added patch to support cpuhotplug for thread-imc
 - Added patch to disable and enable core imc engine in cpuhot plug path
 v3 -> v4 :
 - Changed the events parser code to discover the PMU and events because
   of the changed format of the IMC DTS file (Patch 3).
 - Implemented the two TODOs to include core and thread IMC support with
   this patchset (Patches 7 through 10).
 - Changed the CPU hotplug code of Nest IMC PMUs to include a new state
   CPUHP_AP_PERF_POWERPC_NEST_ONLINE (Patch 6).
 v2 -> v3 :
 - Changed all references for IMA (In-Memory Accumulation) to IMC (In-Memory
   Collection).
 v1 -> v2 :
 - Account for the cases where a PMU can have a common scale and unit
   values for all its supported events (Patch 3/6).
 - Fixed a Build error (for maple_defconfig) by enabling imc_pmu.o
   only for CONFIG_PPC_POWERNV=y (Patch 4/6)
 - Read from the "event-name" property instead of "name" for an event
   node (Patch 3/6).

Anju T Sudhakar (6):
  powerpc/powernv: Autoload IMC device driver module
  powerpc/powernv: Detect supported IMC units and its events
  powerpc/perf: IMC pmu cpumask and cpuhotplug support
  powerpc/powernv: Thread IMC events detection
  powerpc/perf: Thread IMC PMU functions
  powerpc/perf: Thread imc cpuhotplug support

Hemant Kumar (4):
  powerpc/powernv: Data structure and macros definitions for IMC
  powerpc/perf: Add generic IMC pmu groupand event functions
  powerpc/powernv: Core IMC events detection
  powerpc/perf: PMU functions for Core IMC and hotplugging

 
arch/powerpc/include/asm/imc-pmu.h             |  118 +++
 arch/powerpc/include/asm/opal-api.h            |   12 +-
 arch/powerpc/include/asm/opal.h                |    4 +
 arch/powerpc/perf/Makefile                     |    3 +
 arch/powerpc/perf/imc-pmu.c                    | 1098 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/Kconfig         |   10 +
 arch/powerpc/platforms/powernv/Makefile        |    1 +
 arch/powerpc/platforms/powernv/opal-imc.c      |  607 +++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
 arch/powerpc/platforms/powernv/opal.c          |   18 +
 include/linux/cpuhotplug.h                     |    3 +
 11 files changed, 1876 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h
 create mode 100644 arch/powerpc/perf/imc-pmu.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

-- 
2.7.4

^ permalink raw reply	[flat|nested] 26+ messages in thread

* [PATCH v8 01/10] powerpc/powernv: Data structure and macros definitions for IMC
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 02/10] powerpc/powernv: Autoload IMC device driver module Anju T Sudhakar
                   ` (8 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

From: Hemant Kumar <hemant@linux.vnet.ibm.com>

Create a new header file to add the data structures and
macros needed for In-Memory Collection (IMC) counter support.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h | 95 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 95 insertions(+)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
new file mode 100644
index 0000000..d0193c8
--- /dev/null
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -0,0 +1,95 @@
+#ifndef PPC_POWERNV_IMC_PMU_DEF_H
+#define PPC_POWERNV_IMC_PMU_DEF_H
+
+/*
+ * IMC Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *           (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *           (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+/*
+ * For static allocation of some of the structures.
+ */
+#define IMC_MAX_CHIPS			32
+#define IMC_MAX_PMUS			32
+
+/*
+ * This macro is used for memory buffer allocation of
+ * event names and event string
+ */
+#define IMC_MAX_NAME_VAL_LEN		96
+
+/*
+ * Currently Microcode supports a max of 256KB of counter memory
+ * in the reserved memory region. Max pages to mmap (considering 4K PAGESIZE).
+ */
+#define IMC_NEST_MAX_PAGES		64
+
+/*
+ *Compatbility macros for IMC devices
+ */
+#define IMC_DTB_COMPAT			"ibm,opal-in-memory-counters"
+#define IMC_DTB_NEST_COMPAT		"ibm,imc-counters-nest"
+
+/*
+ * Structure to hold per chip specific memory address
+ * information for nest pmus. Nest Counter data are exported
+ * in per-chip reserved memory region by the PORE Engine.
+ */
+struct perchip_nest_info {
+	u32 chip_id;
+	u64 pbase;
+	u64 vbase[IMC_NEST_MAX_PAGES];
+	u64 size;
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct imc_events {
+	char *ev_name;
+	char *ev_value;
+};
+
+#define IMC_FORMAT_ATTR		0
+#define IMC_CPUMASK_ATTR	1
+#define IMC_EVENT_ATTR		2
+#define IMC_NULL_ATTR		3
+
+/*
+ * Device tree parser code detects IMC pmu support and
+ * registers new IMC pmus. This structure will
+ * hold the pmu functions and attrs for each imc pmu and
+ * will be referenced at the time of pmu registration.
+ */
+struct imc_pmu {
+	struct pmu pmu;
+	int domain;
+	/*
+	 * Attribute groups for the PMU. Slot 0 used for
+	 * format attribute, slot 1 used for cpusmask attribute,
+	 * slot 2 used for event attribute. Slot 3 keep as
+	 * NULL.
+	 */
+	const struct attribute_group *attr_groups[4];
+};
+
+/*
+ * Domains for IMC PMUs
+ */
+#define IMC_DOMAIN_NEST		1
+#define IMC_DOMAIN_UNKNOWN	-1
+
+#endif /* PPC_POWERNV_IMC_PMU_DEF_H */
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 02/10] powerpc/powernv: Autoload IMC device driver module
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-11  7:49   ` Stewart Smith
  2017-05-04 14:19 ` [PATCH v8 03/10] powerpc/powernv: Detect supported IMC units and its events Anju T Sudhakar
                   ` (7 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

This patch does three things :
 - Enables "opal.c" to create a platform device for the IMC interface
   according to the appropriate compatibility string.
 - Find the reserved-memory region details from the system device tree
   and get the base address of HOMER (Reserved memory) region address for each chip.
 - We also get the Nest PMU counter data offsets (in the HOMER region)
   and their sizes. The offsets for the counters' data are fixed and
   won't change from chip to chip.

The device tree parsing logic is separated from the PMU creation
functions (which is done in subsequent patches).

Patch also adds a CONFIG_HV_PERF_IMC_CTRS for the IMC driver.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Kconfig    |  10 +++
 arch/powerpc/platforms/powernv/Makefile   |   1 +
 arch/powerpc/platforms/powernv/opal-imc.c | 140 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal.c     |  18 ++++
 4 files changed, 169 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 3a07e4d..1b90a98 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -27,3 +27,13 @@ config OPAL_PRD
 	help
 	  This enables the opal-prd driver, a facility to run processor
 	  recovery diagnostics on OpenPower machines
+
+config HV_PERF_IMC_CTRS
+       bool "Hypervisor supplied In Memory Collection PMU events (Nest & Core)"
+       default y
+       depends on PERF_EVENTS && PPC_POWERNV
+       help
+	  Enable access to hypervisor supplied in-memory collection counters
+	  in perf. IMC counters are available from Power9 systems.
+
+          If unsure, select Y.
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb..715e531 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
 obj-$(CONFIG_TRACEPOINTS)	+= opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD)	+= opal-prd.o
+obj-$(CONFIG_HV_PERF_IMC_CTRS) += opal-imc.o
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
new file mode 100644
index 0000000..3a87000
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -0,0 +1,140 @@
+/*
+ * OPAL IMC interface detection driver
+ * Supported on POWERNV platform
+ *
+ * Copyright	(C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *		(C) 2017 Anju T Sudhakar, IBM Corporation.
+ *		(C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/miscdevice.h>
+#include <linux/fs.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
+#include <linux/poll.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/crash_dump.h>
+#include <asm/opal.h>
+#include <asm/io.h>
+#include <asm/uaccess.h>
+#include <asm/cputable.h>
+#include <asm/imc-pmu.h>
+
+struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+
+/*
+ * imc_pmu_setup : Setup the IMC PMUs (children of "parent").
+ */
+static void __init imc_pmu_setup(struct device_node *parent)
+{
+	if (!parent)
+		return;
+}
+
+static int opal_imc_counters_probe(struct platform_device *pdev)
+{
+	struct device_node *imc_dev, *dn, *rm_node = NULL;
+	struct perchip_nest_info *pcni;
+	u32 pages, nest_offset, nest_size, chip_id;
+	int i = 0;
+	const __be32 *addrp;
+	u64 reg_addr, reg_size;
+
+	if (!pdev || !pdev->dev.of_node)
+		return -ENODEV;
+
+	/*
+	 * Check whether this is kdump kernel. If yes, just return.
+	 */
+	if (is_kdump_kernel())
+		return -ENODEV;
+
+	imc_dev = pdev->dev.of_node;
+
+	/*
+	 * Nest counter data are saved in a reserved memory called HOMER.
+	 * "imc-nest-offset" identifies the counter data location within HOMER.
+	 * size : size of the entire nest-counters region
+	 */
+	if (of_property_read_u32(imc_dev, "imc-nest-offset", &nest_offset))
+		goto err;
+
+	if (of_property_read_u32(imc_dev, "imc-nest-size", &nest_size))
+		goto err;
+
+	/* Sanity check */
+	if ((nest_size/PAGE_SIZE) > IMC_NEST_MAX_PAGES)
+		goto err;
+
+	/* Find the "HOMER region" for each chip */
+	rm_node = of_find_node_by_path("/reserved-memory");
+	if (!rm_node)
+		goto err;
+
+	/*
+	 * We need to look for the "ibm,homer-image" node in the
+	 * "/reserved-memory" node.
+	 */
+	for (dn = of_find_node_by_name(rm_node, "ibm,homer-image"); dn;
+			dn = of_find_node_by_name(dn, "ibm,homer-image")) {
+
+		/* Get the chip id to which the above homer region belongs to */
+		if (of_property_read_u32(dn, "ibm,chip-id", &chip_id))
+			goto err;
+
+		pcni = &nest_perchip_info[chip_id];
+		addrp = of_get_address(dn, 0, &reg_size, NULL);
+		if (!addrp)
+			goto err;
+
+		/* Fetch the homer region base address */
+		reg_addr = of_read_number(addrp, 2);
+		pcni->pbase = reg_addr;
+		/* Add the nest IMC Base offset */
+		pcni->pbase = pcni->pbase + nest_offset;
+		/* Fetch the size of the homer region */
+		pcni->size = nest_size;
+
+		for (i = 0; i < (pcni->size / PAGE_SIZE); i++) {
+			pages = PAGE_SIZE * i;
+			pcni->vbase[i] = (u64)phys_to_virt(pcni->pbase + pages);
+		}
+	}
+
+	imc_pmu_setup(imc_dev);
+
+	return 0;
+err:
+	return -ENODEV;
+}
+
+static const struct of_device_id opal_imc_match[] = {
+	{ .compatible = IMC_DTB_COMPAT },
+	{},
+};
+
+static struct platform_driver opal_imc_driver = {
+	.driver = {
+		.name = "opal-imc-counters",
+		.of_match_table = opal_imc_match,
+	},
+	.probe = opal_imc_counters_probe,
+};
+
+MODULE_DEVICE_TABLE(of, opal_imc_match);
+module_platform_driver(opal_imc_driver);
+MODULE_DESCRIPTION("PowerNV OPAL IMC driver");
+MODULE_LICENSE("GPL");
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index e0f856b..b06f896 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -14,6 +14,7 @@
 #include <linux/printk.h>
 #include <linux/types.h>
 #include <linux/of.h>
+#include <linux/of_address.h>
 #include <linux/of_fdt.h>
 #include <linux/of_platform.h>
 #include <linux/interrupt.h>
@@ -30,6 +31,7 @@
 #include <asm/opal.h>
 #include <asm/firmware.h>
 #include <asm/mce.h>
+#include <asm/imc-pmu.h>
 
 #include "powernv.h"
 
@@ -631,6 +633,17 @@ static void opal_pdev_init(const char *compatible)
 		of_platform_device_create(np, NULL, NULL);
 }
 
+#ifdef CONFIG_HV_PERF_IMC_CTRS
+static void __init opal_imc_init_dev(void)
+{
+	struct device_node *np;
+
+	np = of_find_compatible_node(NULL, NULL, IMC_DTB_COMPAT);
+	if (np)
+		of_platform_device_create(np, NULL, NULL);
+}
+#endif
+
 static int kopald(void *unused)
 {
 	unsigned long timeout = msecs_to_jiffies(opal_heartbeat) + 1;
@@ -704,6 +717,11 @@ static int __init opal_init(void)
 	/* Setup a heatbeat thread if requested by OPAL */
 	opal_init_heartbeat();
 
+#ifdef CONFIG_HV_PERF_IMC_CTRS
+	/* Detect IMC pmu counters support and create PMUs */
+	opal_imc_init_dev();
+#endif
+
 	/* Create leds platform devices */
 	leds = of_find_node_by_path("/ibm,opal/leds");
 	if (leds) {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 03/10] powerpc/powernv: Detect supported IMC units and its events
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 02/10] powerpc/powernv: Autoload IMC device driver module Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 04/10] powerpc/perf: Add generic IMC pmu groupand event functions Anju T Sudhakar
                   ` (6 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Parse device tree to detect IMC units. Traverse through each IMC unit
node to find supported events and corresponding unit/scale files (if any).

Here is the DTS file for reference:

	https://github.com/open-power/ima-catalog/blob/master/81E00612.4E0100.dts

The device tree for IMC counters starts at the node "imc-counters".
This node contains all the IMC PMU nodes and event nodes
for these IMC PMUs. The PMU nodes have an "events" property which has a
phandle value for the actual events node. The events are separated from
the PMU nodes to abstract out the common events. For example, PMU node
"mcs0", "mcs1" etc. will contain a pointer to "nest-mcs-events" since,
the events are common between these PMUs. These events have a different
prefix based on their relation to different PMUs, and hence, the PMU
nodes themselves contain an "events-prefix" property. The value for this
property concatenated to the event name, forms the actual event
name. Also, the PMU have a "reg" field as the base offset for the events
which belong to this PMU. This "reg" field is added to event's "reg" field
in the "events" node, which gives us the location of the counter data. Kernel
code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit property in the event
node and passes on the value as an event attr for perf interface to use
in the post processing by the perf tool. Some PMUs may have common scale
and unit properties which implies that all events supported by this PMU
inherit the scale and unit properties of the PMU itself. For those
events, we need to set the common unit and scale values.

For failure to initialize any unit or any event, disable that unit and
continue setting up the rest of them.

Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/opal-imc.c | 413 ++++++++++++++++++++++++++++++
 1 file changed, 413 insertions(+)

diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 3a87000..0ddaf7d 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -33,15 +33,428 @@
 #include <asm/cputable.h>
 #include <asm/imc-pmu.h>
 
+u64 nest_max_offset;
 struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+static int imc_event_prop_update(char *name, struct imc_events *events)
+{
+	char *buf;
+
+	if (!events || !name)
+		return -EINVAL;
+
+	/* memory for content */
+	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	events->ev_name = name;
+	events->ev_value = buf;
+	return 0;
+}
+
+static int imc_event_prop_str(struct property *pp, char *name,
+			      struct imc_events *events)
+{
+	int ret;
+
+	ret = imc_event_prop_update(name, events);
+	if (ret)
+		return ret;
+
+	if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
+	   (pp->length > IMC_MAX_NAME_VAL_LEN))
+		return -EINVAL;
+	strncpy(events->ev_value, (const char *)pp->value, pp->length);
+
+	return 0;
+}
+
+static int imc_event_prop_val(char *name, u32 val,
+			      struct imc_events *events)
+{
+	int ret;
+
+	ret = imc_event_prop_update(name, events);
+	if (ret)
+		return ret;
+	snprintf(events->ev_value, IMC_MAX_NAME_VAL_LEN, "event=0x%x", val);
+
+	return 0;
+}
+
+static int set_event_property(struct property *pp, char *event_prop,
+			      struct imc_events *events, char *ev_name)
+{
+	char *buf;
+	int ret;
+
+	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	sprintf(buf, "%s.%s", ev_name, event_prop);
+	ret = imc_event_prop_str(pp, buf, events);
+	if (ret) {
+		if (events->ev_name)
+			kfree(events->ev_name);
+		if (events->ev_value)
+			kfree(events->ev_value);
+	}
+	return ret;
+}
+
+/*
+ * Updates the maximum offset for an event in the pmu with domain
+ * "pmu_domain".
+ */
+static void update_max_value(u32 value, int pmu_domain)
+{
+	switch (pmu_domain) {
+	case IMC_DOMAIN_NEST:
+		if (nest_max_offset < value)
+			nest_max_offset = value;
+		break;
+	default:
+		/* Unknown domain, return */
+		return;
+	}
+}
+
+/*
+ * imc_events_node_parser: Parse the event node "dev" and assign the parsed
+ *                         information to event "events".
+ *
+ * Parses the "reg", "scale" and "unit" properties of this event.
+ * "reg" gives us the event offset in the counter memory.
+ */
+static int imc_events_node_parser(struct device_node *dev,
+				  struct imc_events *events,
+				  struct property *event_scale,
+				  struct property *event_unit,
+				  struct property *name_prefix,
+				  u32 reg, int pmu_domain)
+{
+	struct property *name, *pp;
+	char *ev_name;
+	u32 val;
+	int idx = 0, ret;
+
+	if (!dev)
+		goto fail;
+
+	/* Find the event name */
+	name = of_find_property(dev, "event-name", NULL);
+	if (!name)
+		return -ENODEV;
+
+	if (!name->value ||
+	  (strnlen(name->value, name->length) == name->length) ||
+	  (name->length > IMC_MAX_NAME_VAL_LEN))
+		return -EINVAL;
+
+	ev_name = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+	if (!ev_name)
+		return -ENOMEM;
+
+	snprintf(ev_name, IMC_MAX_NAME_VAL_LEN, "%s%s",
+		 (char *)name_prefix->value,
+		 (char *)name->value);
+
+	/*
+	 * Parse each property of this event node "dev". Property "reg" has
+	 * the offset which is assigned to the event name. Other properties
+	 * like "scale" and "unit" are assigned to event.scale and event.unit
+	 * accordingly.
+	 */
+	for_each_property_of_node(dev, pp) {
+		/*
+		 * If there is an issue in parsing a single property of
+		 * this event, we just clean up the buffers, but we still
+		 * continue to parse. XXX: This could be rewritten to skip the
+		 * entire event node incase of parsing issues, but that can be
+		 * done later.
+		 */
+		if (strncmp(pp->name, "reg", 3) == 0) {
+			of_property_read_u32(dev, pp->name, &val);
+			val += reg;
+			update_max_value(val, pmu_domain);
+			ret = imc_event_prop_val(ev_name, val, &events[idx]);
+			if (ret) {
+				if (events[idx].ev_name)
+					kfree(events[idx].ev_name);
+				if (events[idx].ev_value)
+					kfree(events[idx].ev_value);
+				goto fail;
+			}
+			idx++;
+			/*
+			 * If the common scale and unit properties available,
+			 * then, assign them to this event
+			 */
+			if (event_scale) {
+				ret = set_event_property(event_scale, "scale",
+							 &events[idx],
+							 ev_name);
+				if (ret)
+					goto fail;
+				idx++;
+			}
+			if (event_unit) {
+				ret = set_event_property(event_unit, "unit",
+							 &events[idx],
+							 ev_name);
+				if (ret)
+					goto fail;
+				idx++;
+			}
+		} else if (strncmp(pp->name, "unit", 4) == 0) {
+			/*
+			 * The event's unit and scale properties can override the
+			 * PMU's event and scale properties, if present.
+			 */
+			ret = set_event_property(pp, "unit", &events[idx],
+						 ev_name);
+			if (ret)
+				goto fail;
+			idx++;
+		} else if (strncmp(pp->name, "scale", 5) == 0) {
+			ret = set_event_property(pp, "scale", &events[idx],
+						 ev_name);
+			if (ret)
+				goto fail;
+			idx++;
+		}
+	}
+
+	return idx;
+fail:
+	return -EINVAL;
+}
+
+/*
+ * get_nr_children : Returns the number of children for a pmu device node.
+ */
+static int get_nr_children(struct device_node *pmu_node)
+{
+	struct device_node *child;
+	int i = 0;
+
+	for_each_child_of_node(pmu_node, child)
+		i++;
+	return i;
+}
+
+/*
+ * imc_free_events : Cleanup the "events" list having "nr_entries" entries.
+ */
+static void imc_free_events(struct imc_events *events, int nr_entries)
+{
+	int i;
+
+	/* Nothing to clean, return */
+	if (!events)
+		return;
+
+	for (i = 0; i < nr_entries; i++) {
+		if (events[i].ev_name)
+			kfree(events[i].ev_name);
+		if (events[i].ev_value)
+			kfree(events[i].ev_value);
+	}
+
+	kfree(events);
+}
+
+/*
+ * imc_events_setup() : First finds the event node for the pmu and
+ *                      gets the number of supported events and then
+ * allocates memory for the same. Finally returns the address of events
+ * memory allocated.
+ */
+static struct imc_events *imc_events_setup(struct device_node *parent,
+					   int pmu_index,
+					   struct imc_pmu *pmu_ptr,
+					   u32 prop,
+					   int *idx)
+{
+	struct device_node *ev_node = NULL, *dir = NULL;
+	u32 reg;
+	struct imc_events *events;
+	struct property *scale_pp, *unit_pp, *name_prefix;
+	int ret = 0, nr_children = 0;
+
+	/*
+	 * Fetch the actual node where the events for this PMU exist.
+	 */
+	dir = of_find_node_by_phandle(prop);
+	if (!dir)
+		return NULL;
+	/*
+	 * Get the maximum no. of events in this node.
+	 * Multiply by 3 to account for .scale and .unit properties
+	 * This number suggests the amount of memory needed to setup the
+	 * events for this pmu.
+	 */
+	nr_children = get_nr_children(dir) * 3;
+
+	events = kzalloc((sizeof(struct imc_events) * nr_children),
+			 GFP_KERNEL);
+	if (!events)
+		return NULL;
+
+	/*
+	 * Check if there is a common "scale" and "unit" properties inside
+	 * the PMU node for all the events supported by this PMU.
+	 */
+	scale_pp = of_find_property(parent, "scale", NULL);
+	unit_pp = of_find_property(parent, "unit", NULL);
+
+	/*
+	 * Get the event-prefix property from the PMU node
+	 * which needs to be attached with the event names.
+	 */
+	name_prefix = of_find_property(parent, "events-prefix", NULL);
+	if (!name_prefix)
+		goto free_events;
+
+	/*
+	 * "reg" property gives out the base offset of the counters data
+	 * for this PMU.
+	 */
+	of_property_read_u32(parent, "reg", &reg);
+
+	if (!name_prefix->value ||
+	   (strnlen(name_prefix->value, name_prefix->length) == name_prefix->length) ||
+	   (name_prefix->length > IMC_MAX_NAME_VAL_LEN))
+		goto free_events;
+
+	/* Loop through event nodes */
+	for_each_child_of_node(dir, ev_node) {
+		ret = imc_events_node_parser(ev_node, &events[*idx], scale_pp,
+				unit_pp, name_prefix, reg, pmu_ptr->domain);
+		if (ret < 0) {
+			/* Unable to parse this event */
+			if (ret == -ENOMEM)
+				goto free_events;
+			continue;
+		}
+
+		/*
+		 * imc_event_node_parser will return number of
+		 * event entries created for this. This could include
+		 * event scale and unit files also.
+		 */
+		*idx += ret;
+		}
+		return events;
+
+free_events:
+	imc_free_events(events, *idx);
+	return NULL;
+
+}
+
+/*
+ * imc_pmu_create : Takes the parent device which is the pmu unit and a
+ *                  pmu_index as the inputs.
+ * Allocates memory for the pmu, sets up its domain (NEST), and
+ * calls imc_events_setup() to allocate memory for the events supported
+ * by this pmu. Assigns a name for the pmu. Calls imc_events_node_parser()
+ * to setup the individual events.
+ * If everything goes fine, it calls, init_imc_pmu() to setup the pmu device
+ * and register it.
+ */
+static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
+{
+	struct imc_events *events = NULL;
+	struct imc_pmu *pmu_ptr;
+	u32 prop = 0;
+	struct property *pp;
+	char *buf;
+	int idx = 0, ret = 0;
+
+	if (!parent)
+		return -EINVAL;
+
+	/* memory for pmu */
+	pmu_ptr = kzalloc(sizeof(struct imc_pmu), GFP_KERNEL);
+	if (!pmu_ptr)
+		return -ENOMEM;
+
+	pmu_ptr->domain = domain;
+	if (pmu_ptr->domain == IMC_DOMAIN_UNKNOWN)
+		goto free_pmu;
+
+	/* Needed for hotplug/migration */
+	per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+	pp = of_find_property(parent, "name", NULL);
+	if (!pp) {
+		ret = -ENODEV;
+		goto free_pmu;
+	}
+
+	if (!pp->value ||
+	   (strnlen(pp->value, pp->length) == pp->length) ||
+	   (pp->length > IMC_MAX_NAME_VAL_LEN)) {
+		ret = -EINVAL;
+		goto free_pmu;
+	}
+
+	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto free_pmu;
+	}
+	/* Save the name to register it later */
+	sprintf(buf, "nest_%s", (char *)pp->value);
+	pmu_ptr->pmu.name = (char *)buf;
+
+	/*
+	 * "events" property inside a PMU node contains the phandle value
+	 * for the actual events node. The "events" node for the IMC PMU
+	 * is not in this node, rather inside "imc-counters" node, since,
+	 * we want to factor out the common events (thereby, reducing the
+	 * size of the device tree)
+	 */
+	of_property_read_u32(parent, "events", &prop);
+	if (prop)
+		events = imc_events_setup(parent, pmu_index, pmu_ptr,
+					  prop, &idx);
+	return 0;
+
+free_pmu:
+	kfree(pmu_ptr);
+	return ret;
+}
+
 
 /*
  * imc_pmu_setup : Setup the IMC PMUs (children of "parent").
+ *
+ * Top level "imc-counters" node contains both event-nodes and pmu
+ * unit nodes. We only consider the pmu unit node here.
  */
 static void __init imc_pmu_setup(struct device_node *parent)
 {
+	struct device_node *child;
+	int pmu_count = 0, rc = 0, domain;
+
 	if (!parent)
 		return;
+	/*
+	 * Loop through the imc-counters tree for each compatible
+	 * "ibm,imc-counters-nest", and update "struct imc_pmu".
+	 */
+	for_each_compatible_node(child, NULL, IMC_DTB_NEST_COMPAT) {
+		domain = IMC_DOMAIN_NEST;
+		rc = imc_pmu_create(child, pmu_count, domain);
+		if (rc)
+			return;
+		pmu_count++;
+	}
 }
 
 static int opal_imc_counters_probe(struct platform_device *pdev)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 04/10] powerpc/perf: Add generic IMC pmu groupand event functions
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (2 preceding siblings ...)
  2017-05-04 14:19 ` [PATCH v8 03/10] powerpc/powernv: Detect supported IMC units and its events Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support Anju T Sudhakar
                   ` (5 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

From: Hemant Kumar <hemant@linux.vnet.ibm.com>

Device tree IMC driver code parses the IMC units and their events. It
passes the information to IMC pmu code which is placed in powerpc/perf
as "imc-pmu.c".

Patch adds a set of generic imc pmu related event functions to be
used  by each imc pmu unit. Add code to setup format attribute and to
register imc pmus. Add a event_init function for nest_imc events.

Since, the IMC counters' data are periodically fed to a memory location,
the functions to read/update, start/stop, add/del can be generic and can
be used by all IMC PMU units.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |   3 +
 arch/powerpc/perf/Makefile                |   3 +
 arch/powerpc/perf/imc-pmu.c               | 269 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-imc.c |  10 +-
 4 files changed, 283 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/perf/imc-pmu.c

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index d0193c8..6bbe184 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -92,4 +92,7 @@ struct imc_pmu {
 #define IMC_DOMAIN_NEST		1
 #define IMC_DOMAIN_UNKNOWN	-1
 
+extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 4d606b9..b29d918 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -6,6 +6,9 @@ obj-$(CONFIG_PPC_PERF_CTRS)	+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)	+= power4-pmu.o ppc970-pmu.o power5-pmu.o \
 				   power5+-pmu.o power6-pmu.o power7-pmu.o \
 				   isa207-common.o power8-pmu.o power9-pmu.o
+
+obj-$(CONFIG_HV_PERF_IMC_CTRS)	+= imc-pmu.o
+
 obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
new file mode 100644
index 0000000..f09a37a
--- /dev/null
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -0,0 +1,269 @@
+/*
+ * Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *           (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *           (C) 2017 Hemant K Shaw, IBM Corporation.
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms of the GNU General Public License version 2 as published
+ * by the Free Software Foundation.
+ */
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <asm/opal.h>
+#include <asm/imc-pmu.h>
+#include <asm/cputhreads.h>
+#include <asm/smp.h>
+#include <linux/string.h>
+
+struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+/* Needed for sanity check */
+extern u64 nest_max_offset;
+
+PMU_FORMAT_ATTR(event, "config:0-20");
+static struct attribute *imc_format_attrs[] = {
+	&format_attr_event.attr,
+	NULL,
+};
+
+static struct attribute_group imc_format_group = {
+	.name = "format",
+	.attrs = imc_format_attrs,
+};
+
+static int nest_imc_event_init(struct perf_event *event)
+{
+	int chip_id;
+	u32 config = event->attr.config;
+	struct perchip_nest_info *pcni;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Sampling not supported */
+	if (event->hw.sample_period)
+		return -EINVAL;
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest)
+		return -EINVAL;
+
+	if (event->cpu < 0)
+		return -EINVAL;
+
+	/* Sanity check for config (event offset) */
+	if (config > nest_max_offset)
+		return -EINVAL;
+
+	chip_id = topology_physical_package_id(event->cpu);
+	pcni = &nest_perchip_info[chip_id];
+
+	/*
+	 * Memory for Nest HW counter data could be in multiple pages.
+	 * Hence check and pick the right event base page for chip with
+	 * "chip_id" and add "config" to it".
+	 */
+	event->hw.event_base = pcni->vbase[config/PAGE_SIZE] +
+							(config & ~PAGE_MASK);
+
+	return 0;
+}
+
+static void imc_read_counter(struct perf_event *event)
+{
+	u64 *addr, data;
+
+	/*
+	 * In-Memory Collection (IMC) counters are free flowing counters.
+	 * So we take a snapshot of the counter value on enable and save it
+	 * to calculate the delta at later stage to present the event counter
+	 * value.
+	 */
+	addr = (u64 *)event->hw.event_base;
+	data = __be64_to_cpu(READ_ONCE(*addr));
+	local64_set(&event->hw.prev_count, data);
+}
+
+static void imc_perf_event_update(struct perf_event *event)
+{
+	u64 counter_prev, counter_new, final_count, *addr;
+
+	addr = (u64 *)event->hw.event_base;
+	counter_prev = local64_read(&event->hw.prev_count);
+	counter_new = __be64_to_cpu(READ_ONCE(*addr));
+	final_count = counter_new - counter_prev;
+
+	/*
+	 * Need to update prev_count is that, counter could be
+	 * read in a periodic interval from the tool side.
+	 */
+	local64_set(&event->hw.prev_count, counter_new);
+	/* Update the delta to the event count */
+	local64_add(final_count, &event->count);
+}
+
+static void imc_event_start(struct perf_event *event, int flags)
+{
+	/*
+	 * In Memory Counters are free flowing counters. HW or the microcode
+	 * keeps adding to the counter offset in memory. To get event
+	 * counter value, we snapshot the value here and we calculate
+	 * delta at later point.
+	 */
+	imc_read_counter(event);
+}
+
+static void imc_event_stop(struct perf_event *event, int flags)
+{
+	/*
+	 * Take a snapshot and calculate the delta and update
+	 * the event counter values.
+	 */
+	imc_perf_event_update(event);
+}
+
+/*
+ * The wrapper function is provided here, since we will have reserve
+ * and release lock for imc_event_start() in the following patch.
+ * Same in case of imc_event_stop().
+ */
+static void nest_imc_event_start(struct perf_event *event, int flags)
+{
+	imc_event_start(event, flags);
+}
+
+static void nest_imc_event_stop(struct perf_event *event, int flags)
+{
+	imc_event_stop(event, flags);
+}
+
+static int nest_imc_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		nest_imc_event_start(event, flags);
+
+	return 0;
+}
+
+/* update_pmu_ops : Populate the appropriate operations for "pmu" */
+static int update_pmu_ops(struct imc_pmu *pmu)
+{
+	if (!pmu)
+		return -EINVAL;
+
+	pmu->pmu.task_ctx_nr = perf_invalid_context;
+	pmu->pmu.event_init = nest_imc_event_init;
+	pmu->pmu.add = nest_imc_event_add;
+	pmu->pmu.del = nest_imc_event_stop;
+	pmu->pmu.start = nest_imc_event_start;
+	pmu->pmu.stop = nest_imc_event_stop;
+	pmu->pmu.read = imc_perf_event_update;
+	pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
+	pmu->pmu.attr_groups = pmu->attr_groups;
+
+	return 0;
+}
+
+/* dev_str_attr : Populate event "name" and string "str" in attribute */
+static struct attribute *dev_str_attr(const char *name, const char *str)
+{
+	struct perf_pmu_events_attr *attr;
+
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+	if (!attr)
+		return NULL;
+	sysfs_attr_init(&attr->attr.attr);
+
+	attr->event_str = str;
+	attr->attr.attr.name = name;
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = perf_event_sysfs_show;
+
+	return &attr->attr.attr;
+}
+
+/*
+ * update_events_in_group: Update the "events" information in an attr_group
+ *                         and assign the attr_group to the pmu "pmu".
+ */
+static int update_events_in_group(struct imc_events *events,
+				  int idx, struct imc_pmu *pmu)
+{
+	struct attribute_group *attr_group;
+	struct attribute **attrs;
+	int i;
+
+	/* If there is no events for this pmu, just return zero */
+	if (!events)
+		return 0;
+
+	/* Allocate memory for attribute group */
+	attr_group = kzalloc(sizeof(*attr_group), GFP_KERNEL);
+	if (!attr_group)
+		return -ENOMEM;
+
+	/* Allocate memory for attributes */
+	attrs = kzalloc((sizeof(struct attribute *) * (idx + 1)), GFP_KERNEL);
+	if (!attrs) {
+		kfree(attr_group);
+		return -ENOMEM;
+	}
+
+	attr_group->name = "events";
+	attr_group->attrs = attrs;
+	for (i = 0; i < idx; i++, events++) {
+		attrs[i] = dev_str_attr((char *)events->ev_name,
+					(char *)events->ev_value);
+	}
+
+	/* Save the event attribute */
+	pmu->attr_groups[IMC_EVENT_ATTR] = attr_group;
+	return 0;
+}
+
+/*
+ * init_imc_pmu : Setup and register the IMC pmu device.
+ *
+ * @events:	events memory for this pmu.
+ * @idx:	number of event entries created.
+ * @pmu_ptr:	memory allocated for this pmu.
+ */
+int __init init_imc_pmu(struct imc_events *events, int idx,
+		 struct imc_pmu *pmu_ptr)
+{
+	int ret = -ENODEV;
+
+	ret = update_events_in_group(events, idx, pmu_ptr);
+	if (ret)
+		goto err_free;
+
+	ret = update_pmu_ops(pmu_ptr);
+	if (ret)
+		goto err_free;
+
+	ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+	if (ret)
+		goto err_free;
+
+	pr_info("%s performance monitor hardware support registered\n",
+		pmu_ptr->pmu.name);
+
+	return 0;
+
+err_free:
+	/* Only free the attr_groups which are dynamically allocated  */
+	if (pmu_ptr->attr_groups[IMC_EVENT_ATTR]) {
+		if (pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs)
+			kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
+		kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
+	}
+
+	return ret;
+}
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 0ddaf7d..61f6d67 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -34,8 +34,6 @@
 #include <asm/imc-pmu.h>
 
 u64 nest_max_offset;
-struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
-struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 
 static int imc_event_prop_update(char *name, struct imc_events *events)
 {
@@ -423,8 +421,16 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
 	if (prop)
 		events = imc_events_setup(parent, pmu_index, pmu_ptr,
 					  prop, &idx);
+	/* Function to register IMC pmu */
+	ret = init_imc_pmu(events, idx, pmu_ptr);
+	if (ret) {
+		pr_err("IMC PMU %s Register failed\n", pmu_ptr->pmu.name);
+		goto free_events;
+	}
 	return 0;
 
+free_events:
+	imc_free_events(events, idx);
 free_pmu:
 	kfree(pmu_ptr);
 	return ret;
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (3 preceding siblings ...)
  2017-05-04 14:19 ` [PATCH v8 04/10] powerpc/perf: Add generic IMC pmu groupand event functions Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-08 14:12   ` Daniel Axtens
  2017-05-10 12:09   ` Thomas Gleixner
  2017-05-04 14:19 ` [PATCH v8 06/10] powerpc/powernv: Core IMC events detection Anju T Sudhakar
                   ` (4 subsequent siblings)
  9 siblings, 2 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
online CPU) from each chip for nest PMUs is designated to read counters.

On CPU hotplug, dying CPU is checked to see whether it is one of the
designated cpus, if yes, next online cpu from the same chip (for nest
units) is designated as new cpu to read counters. For this purpose, we
introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h             |   4 +
 arch/powerpc/include/asm/opal-api.h            |  12 +-
 arch/powerpc/include/asm/opal.h                |   4 +
 arch/powerpc/perf/imc-pmu.c                    | 248 ++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
 include/linux/cpuhotplug.h                     |   1 +
 6 files changed, 266 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index 6bbe184..1478d0f 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -92,6 +92,10 @@ struct imc_pmu {
 #define IMC_DOMAIN_NEST		1
 #define IMC_DOMAIN_UNKNOWN	-1
 
+#define IMC_COUNTER_ENABLE	1
+#define IMC_COUNTER_DISABLE	0
+
+
 extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
index a0aa285..ce863d9 100644
--- a/arch/powerpc/include/asm/opal-api.h
+++ b/arch/powerpc/include/asm/opal-api.h
@@ -168,7 +168,10 @@
 #define OPAL_INT_SET_MFRR			125
 #define OPAL_PCI_TCE_KILL			126
 #define OPAL_NMMU_SET_PTCR			127
-#define OPAL_LAST				127
+#define OPAL_IMC_COUNTERS_INIT			149
+#define OPAL_IMC_COUNTERS_START			150
+#define OPAL_IMC_COUNTERS_STOP			151
+#define OPAL_LAST				151
 
 /* Device tree flags */
 
@@ -928,6 +931,13 @@ enum {
 	OPAL_PCI_TCE_KILL_ALL,
 };
 
+/* Argument to OPAL_IMC_COUNTERS_*  */
+enum {
+	OPAL_IMC_COUNTERS_NEST = 1,
+	OPAL_IMC_COUNTERS_CORE = 2,
+	OPAL_IMC_COUNTERS_THREAD = 3,
+};
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __OPAL_API_H */
diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
index 1ff03a6..9c16ec6 100644
--- a/arch/powerpc/include/asm/opal.h
+++ b/arch/powerpc/include/asm/opal.h
@@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
 			  uint64_t dma_addr, uint32_t npages);
 int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
 
+int64_t opal_imc_counters_init(uint32_t type, uint64_t address);
+int64_t opal_imc_counters_start(uint32_t type);
+int64_t opal_imc_counters_stop(uint32_t type);
+
 /* Internal functions */
 extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
 				   int depth, void *data);
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index f09a37a..40792424 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -18,6 +18,11 @@
 
 struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+static cpumask_t nest_imc_cpumask;
+
+static atomic_t nest_events;
+/* Used to avoid races in calling enable/disable nest-pmu units*/
+static DEFINE_MUTEX(imc_nest_reserve);
 
 /* Needed for sanity check */
 extern u64 nest_max_offset;
@@ -33,6 +38,160 @@ static struct attribute_group imc_format_group = {
 	.attrs = imc_format_attrs,
 };
 
+/* Get the cpumask printed to a buffer "buf" */
+static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	cpumask_t *active_mask;
+
+	active_mask = &nest_imc_cpumask;
+	return cpumap_print_to_pagebuf(true, buf, active_mask);
+}
+
+static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
+
+static struct attribute *imc_pmu_cpumask_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL,
+};
+
+static struct attribute_group imc_pmu_cpumask_attr_group = {
+	.attrs = imc_pmu_cpumask_attrs,
+};
+
+/*
+ * nest_init : Initializes the nest imc engine for the current chip.
+ * by default the nest engine is disabled.
+ */
+static void nest_init(int *cpu_opal_rc)
+{
+	int rc;
+
+	/*
+	 * OPAL figures out which CPU to start based on the CPU that is
+	 * currently running when we call into OPAL
+	 */
+	rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
+	if (rc)
+		cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+static void nest_change_cpu_context(int old_cpu, int new_cpu)
+{
+	int i;
+
+	for (i = 0;
+	     (per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
+		perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
+							old_cpu, new_cpu);
+}
+
+static int ppc_nest_imc_cpu_online(unsigned int cpu)
+{
+	int nid;
+	const struct cpumask *l_cpumask;
+	struct cpumask tmp_mask;
+
+	/* Find the cpumask of this node */
+	nid = cpu_to_node(cpu);
+	l_cpumask = cpumask_of_node(nid);
+
+	/*
+	 * If any of the cpu from this node is already present in the mask,
+	 * just return, if not, then set this cpu in the mask.
+	 */
+	if (!cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask)) {
+		cpumask_set_cpu(cpu, &nest_imc_cpumask);
+		nest_change_cpu_context(-1, cpu);
+		return 0;
+	}
+
+	return 0;
+}
+
+static int ppc_nest_imc_cpu_offline(unsigned int cpu)
+{
+	int nid, target = -1;
+	const struct cpumask *l_cpumask;
+
+	/*
+	 * Check in the designated list for this cpu. Dont bother
+	 * if not one of them.
+	 */
+	if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
+		return 0;
+
+	/*
+	 * Now that this cpu is one of the designated,
+	 * find a next cpu a) which is online and b) in same chip.
+	 */
+	nid = cpu_to_node(cpu);
+	l_cpumask = cpumask_of_node(nid);
+	target = cpumask_next(cpu, l_cpumask);
+
+	/*
+	 * Update the cpumask with the target cpu and
+	 * migrate the context if needed
+	 */
+	if (target >= 0 && target <= nr_cpu_ids) {
+		cpumask_set_cpu(target, &nest_imc_cpumask);
+		nest_change_cpu_context(cpu, target);
+	}
+	return 0;
+}
+
+static int nest_pmu_cpumask_init(void)
+{
+	const struct cpumask *l_cpumask;
+	int cpu, nid;
+	int *cpus_opal_rc;
+
+	if (!cpumask_empty(&nest_imc_cpumask))
+		return 0;
+
+	/*
+	 * Memory for OPAL call return value.
+	 */
+	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+	if (!cpus_opal_rc)
+		goto fail;
+
+	/*
+	 * Nest PMUs are per-chip counters. So designate a cpu
+	 * from each chip for counter collection.
+	 */
+	for_each_online_node(nid) {
+		l_cpumask = cpumask_of_node(nid);
+
+		/* designate first online cpu in this node */
+		cpu = cpumask_first(l_cpumask);
+		cpumask_set_cpu(cpu, &nest_imc_cpumask);
+	}
+
+	/* Initialize Nest PMUs in each node using designated cpus */
+	on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
+						(void *)cpus_opal_rc, 1);
+
+	/* Check return value array for any OPAL call failure */
+	for_each_cpu(cpu, &nest_imc_cpumask) {
+		if (cpus_opal_rc[cpu])
+			goto fail;
+	}
+
+	cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
+			  "POWER_NEST_IMC_ONLINE",
+			  ppc_nest_imc_cpu_online,
+			  ppc_nest_imc_cpu_offline);
+
+	return 0;
+
+fail:
+	if (cpus_opal_rc)
+		kfree(cpus_opal_rc);
+	return -ENODEV;
+}
+
 static int nest_imc_event_init(struct perf_event *event)
 {
 	int chip_id;
@@ -109,6 +268,51 @@ static void imc_perf_event_update(struct perf_event *event)
 	local64_add(final_count, &event->count);
 }
 
+static void nest_imc_start(int *cpu_opal_rc)
+{
+	int rc;
+
+	/* Enable nest engine */
+	rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_NEST);
+	if (rc)
+		cpu_opal_rc[smp_processor_id()] = 1;
+
+}
+
+static int nest_imc_control(int operation)
+{
+	int *cpus_opal_rc, cpu;
+
+	/*
+	 * Memory for OPAL call return value.
+	 */
+	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+	if (!cpus_opal_rc)
+		return -ENOMEM;
+	switch (operation) {
+
+	case	IMC_COUNTER_ENABLE:
+			/* Initialize Nest PMUs in each node using designated cpus */
+			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_imc_start,
+						(void *)cpus_opal_rc, 1);
+			break;
+	case	IMC_COUNTER_DISABLE:
+			/* Disable the counters */
+			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
+						(void *)cpus_opal_rc, 1);
+			break;
+	default: return -EINVAL;
+
+	}
+
+	/* Check return value array for any OPAL call failure */
+	for_each_cpu(cpu, &nest_imc_cpumask) {
+		if (cpus_opal_rc[cpu])
+			return -ENODEV;
+	}
+	return 0;
+}
+
 static void imc_event_start(struct perf_event *event, int flags)
 {
 	/*
@@ -129,19 +333,44 @@ static void imc_event_stop(struct perf_event *event, int flags)
 	imc_perf_event_update(event);
 }
 
-/*
- * The wrapper function is provided here, since we will have reserve
- * and release lock for imc_event_start() in the following patch.
- * Same in case of imc_event_stop().
- */
 static void nest_imc_event_start(struct perf_event *event, int flags)
 {
+	int rc;
+
+	/*
+	 * Nest pmu units are enabled only when it is used.
+	 * See if this is triggered for the first time.
+	 * If yes, take the mutex lock and enable the nest counters.
+	 * If not, just increment the count in nest_events.
+	 */
+	if (atomic_inc_return(&nest_events) == 1) {
+		mutex_lock(&imc_nest_reserve);
+		rc = nest_imc_control(IMC_COUNTER_ENABLE);
+		mutex_unlock(&imc_nest_reserve);
+		if (rc)
+			pr_err("IMC: Unbale to start the counters\n");
+	}
 	imc_event_start(event, flags);
 }
 
 static void nest_imc_event_stop(struct perf_event *event, int flags)
 {
+	int rc;
+
 	imc_event_stop(event, flags);
+	/*
+	 * See if we need to disable the nest PMU.
+	 * If no events are currently in use, then we have to take a
+	 * mutex to ensure that we don't race with another task doing
+	 * enable or disable the nest counters.
+	 */
+	if (atomic_dec_return(&nest_events) == 0) {
+		mutex_lock(&imc_nest_reserve);
+		rc = nest_imc_control(IMC_COUNTER_DISABLE);
+		mutex_unlock(&imc_nest_reserve);
+		if (rc)
+			pr_err("IMC: Disable counters failed\n");
+	}
 }
 
 static int nest_imc_event_add(struct perf_event *event, int flags)
@@ -165,6 +394,7 @@ static int update_pmu_ops(struct imc_pmu *pmu)
 	pmu->pmu.start = nest_imc_event_start;
 	pmu->pmu.stop = nest_imc_event_stop;
 	pmu->pmu.read = imc_perf_event_update;
+	pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
 	pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
 	pmu->pmu.attr_groups = pmu->attr_groups;
 
@@ -234,12 +464,20 @@ static int update_events_in_group(struct imc_events *events,
  * @events:	events memory for this pmu.
  * @idx:	number of event entries created.
  * @pmu_ptr:	memory allocated for this pmu.
+ *
+ * init_imc_pmu() setup the cpu mask information for these pmus and setup
+ * the state machine hotplug notifiers as well.
  */
 int __init init_imc_pmu(struct imc_events *events, int idx,
 		 struct imc_pmu *pmu_ptr)
 {
 	int ret = -ENODEV;
 
+	/* Add cpumask and register for hotplug notification */
+	ret = nest_pmu_cpumask_init();
+	if (ret)
+		return ret;
+
 	ret = update_events_in_group(events, idx, pmu_ptr);
 	if (ret)
 		goto err_free;
diff --git a/arch/powerpc/platforms/powernv/opal-wrappers.S b/arch/powerpc/platforms/powernv/opal-wrappers.S
index da8a0f7..4ede687 100644
--- a/arch/powerpc/platforms/powernv/opal-wrappers.S
+++ b/arch/powerpc/platforms/powernv/opal-wrappers.S
@@ -301,3 +301,6 @@ OPAL_CALL(opal_int_eoi,				OPAL_INT_EOI);
 OPAL_CALL(opal_int_set_mfrr,			OPAL_INT_SET_MFRR);
 OPAL_CALL(opal_pci_tce_kill,			OPAL_PCI_TCE_KILL);
 OPAL_CALL(opal_nmmu_set_ptcr,			OPAL_NMMU_SET_PTCR);
+OPAL_CALL(opal_imc_counters_init,		OPAL_IMC_COUNTERS_INIT);
+OPAL_CALL(opal_imc_counters_start,		OPAL_IMC_COUNTERS_START);
+OPAL_CALL(opal_imc_counters_stop,		OPAL_IMC_COUNTERS_STOP);
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 62d240e..51dff54 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -137,6 +137,7 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_ARM_CCN_ONLINE,
 	CPUHP_AP_PERF_ARM_L2X0_ONLINE,
 	CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
+	CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
 	CPUHP_AP_WORKQUEUE_ONLINE,
 	CPUHP_AP_RCUTREE_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 06/10] powerpc/powernv: Core IMC events detection
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (4 preceding siblings ...)
  2017-05-04 14:19 ` [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging Anju T Sudhakar
                   ` (3 subsequent siblings)
  9 siblings, 0 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

From: Hemant Kumar <hemant@linux.vnet.ibm.com>

This patch adds support for detection of core IMC events along with the
Nest IMC events. It adds a new domain IMC_DOMAIN_CORE and its determined
with the help of the compatibility string "ibm,imc-counters-core" based
on the IMC device tree.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |  4 +++-
 arch/powerpc/perf/imc-pmu.c               |  3 +++
 arch/powerpc/platforms/powernv/opal-imc.c | 28 +++++++++++++++++++++++++---
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index 1478d0f..37fdd79 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -42,6 +42,7 @@
  */
 #define IMC_DTB_COMPAT			"ibm,opal-in-memory-counters"
 #define IMC_DTB_NEST_COMPAT		"ibm,imc-counters-nest"
+#define IMC_DTB_CORE_COMPAT		"ibm,imc-counters-core"
 
 /*
  * Structure to hold per chip specific memory address
@@ -90,13 +91,14 @@ struct imc_pmu {
  * Domains for IMC PMUs
  */
 #define IMC_DOMAIN_NEST		1
+#define IMC_DOMAIN_CORE		2
 #define IMC_DOMAIN_UNKNOWN	-1
 
 #define IMC_COUNTER_ENABLE	1
 #define IMC_COUNTER_DISABLE	0
 
-
 extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern struct imc_pmu *core_imc_pmu;
 extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 40792424..c132df2 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -24,8 +24,11 @@ static atomic_t nest_events;
 /* Used to avoid races in calling enable/disable nest-pmu units*/
 static DEFINE_MUTEX(imc_nest_reserve);
 
+struct imc_pmu *core_imc_pmu;
+
 /* Needed for sanity check */
 extern u64 nest_max_offset;
+extern u64 core_max_offset;
 
 PMU_FORMAT_ATTR(event, "config:0-20");
 static struct attribute *imc_format_attrs[] = {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 61f6d67..d712ef3 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -34,6 +34,7 @@
 #include <asm/imc-pmu.h>
 
 u64 nest_max_offset;
+u64 core_max_offset;
 
 static int imc_event_prop_update(char *name, struct imc_events *events)
 {
@@ -114,6 +115,10 @@ static void update_max_value(u32 value, int pmu_domain)
 		if (nest_max_offset < value)
 			nest_max_offset = value;
 		break;
+	case IMC_DOMAIN_CORE:
+		if (core_max_offset < value)
+			core_max_offset = value;
+		break;
 	default:
 		/* Unknown domain, return */
 		return;
@@ -357,7 +362,7 @@ static struct imc_events *imc_events_setup(struct device_node *parent,
 /*
  * imc_pmu_create : Takes the parent device which is the pmu unit and a
  *                  pmu_index as the inputs.
- * Allocates memory for the pmu, sets up its domain (NEST), and
+ * Allocates memory for the pmu, sets up its domain (NEST/CORE), and
  * calls imc_events_setup() to allocate memory for the events supported
  * by this pmu. Assigns a name for the pmu. Calls imc_events_node_parser()
  * to setup the individual events.
@@ -386,7 +391,10 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
 		goto free_pmu;
 
 	/* Needed for hotplug/migration */
-	per_nest_pmu_arr[pmu_index] = pmu_ptr;
+	if (pmu_ptr->domain == IMC_DOMAIN_CORE)
+		core_imc_pmu = pmu_ptr;
+	else if (pmu_ptr->domain == IMC_DOMAIN_NEST)
+		per_nest_pmu_arr[pmu_index] = pmu_ptr;
 
 	pp = of_find_property(parent, "name", NULL);
 	if (!pp) {
@@ -407,7 +415,10 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
 		goto free_pmu;
 	}
 	/* Save the name to register it later */
-	sprintf(buf, "nest_%s", (char *)pp->value);
+	if (pmu_ptr->domain == IMC_DOMAIN_NEST)
+		sprintf(buf, "nest_%s", (char *)pp->value);
+	else
+		sprintf(buf, "%s_imc", (char *)pp->value);
 	pmu_ptr->pmu.name = (char *)buf;
 
 	/*
@@ -461,6 +472,17 @@ static void __init imc_pmu_setup(struct device_node *parent)
 			return;
 		pmu_count++;
 	}
+	/*
+	 * Loop through the imc-counters tree for each compatible
+	 * "ibm,imc-counters-core", and update "struct imc_pmu".
+	 */
+	for_each_compatible_node(child, NULL, IMC_DTB_CORE_COMPAT) {
+		domain = IMC_DOMAIN_CORE;
+		rc = imc_pmu_create(child, pmu_count, domain);
+		if (rc)
+			return;
+		pmu_count++;
+	}
 }
 
 static int opal_imc_counters_probe(struct platform_device *pdev)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (5 preceding siblings ...)
  2017-05-04 14:19 ` [PATCH v8 06/10] powerpc/powernv: Core IMC events detection Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-17  7:57   ` Stewart Smith
  2017-05-04 14:19 ` [PATCH v8 08/10] powerpc/powernv: Thread IMC events detection Anju T Sudhakar
                   ` (2 subsequent siblings)
  9 siblings, 1 reply; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

From: Hemant Kumar <hemant@linux.vnet.ibm.com>

This patch adds the PMU function to initialize a core IMC event. It also
adds cpumask initialization function for core IMC PMU. For
initialization, a 8KB of memory is allocated per core where the data
for core IMC counters will be accumulated. The base address for this
page is sent to OPAL via an OPAL call which initializes various SCOMs
related to Core IMC initialization. Upon any errors, the pages are
free'ed and core IMC counters are disabled using the same OPAL call.

For CPU hotplugging, a cpumask is initialized which contains an online
CPU from each core. If a cpu goes offline, we check whether that cpu
belongs to the core imc cpumask, if yes, then, we migrate the PMU
context to any other online cpu (if available) in that core. If a cpu
comes back online, then this cpu will be added to the core imc cpumask
only if there was no other cpu from that core in the previous cpumask.

To register the hotplug functions for core_imc, a new state
CPUHP_AP_PERF_POWERPC_COREIMC_ONLINE is added to the list of existing
states.

Patch also adds OPAL device shutdown callback. Needed to disable the
IMC core engine to handle kexec.

Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |   7 +
 arch/powerpc/perf/imc-pmu.c               | 380 +++++++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/opal-imc.c |   7 +
 include/linux/cpuhotplug.h                |   1 +
 4 files changed, 384 insertions(+), 11 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index 37fdd79..bf5fb7c 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -24,6 +24,7 @@
  */
 #define IMC_MAX_CHIPS			32
 #define IMC_MAX_PMUS			32
+#define IMC_MAX_CORES			32
 
 /*
  * This macro is used for memory buffer allocation of
@@ -38,6 +39,11 @@
 #define IMC_NEST_MAX_PAGES		64
 
 /*
+ * IMC Core engine expects 8K bytes of memory for counter collection.
+ */
+#define IMC_CORE_COUNTER_MEM		8192
+
+/*
  *Compatbility macros for IMC devices
  */
 #define IMC_DTB_COMPAT			"ibm,opal-in-memory-counters"
@@ -101,4 +107,5 @@ extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
 extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
+void core_imc_disable(void);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index c132df2..fb71825 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -1,5 +1,5 @@
 /*
- * Nest Performance Monitor counter support.
+ * IMC Performance Monitor counter support.
  *
  * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
  *           (C) 2017 Anju T Sudhakar, IBM Corporation.
@@ -21,9 +21,21 @@ struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 static cpumask_t nest_imc_cpumask;
 
 static atomic_t nest_events;
+static atomic_t core_events;
 /* Used to avoid races in calling enable/disable nest-pmu units*/
 static DEFINE_MUTEX(imc_nest_reserve);
+/* Used to avoid races in calling enable/disable core-pmu units */
+static DEFINE_MUTEX(imc_core_reserve);
 
+/*
+ * Maintains base addresses for all the cores.
+ * MAX chip and core are defined as 32. So we
+ * statically allocate 8K for this structure.
+ *
+ * TODO -- Could be made dynamic
+ */
+static u64 per_core_pdbar_add[IMC_MAX_CHIPS][IMC_MAX_CORES];
+static cpumask_t core_imc_cpumask;
 struct imc_pmu *core_imc_pmu;
 
 /* Needed for sanity check */
@@ -46,9 +58,15 @@ static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
 					struct device_attribute *attr,
 					char *buf)
 {
+	struct pmu *pmu = dev_get_drvdata(dev);
 	cpumask_t *active_mask;
 
-	active_mask = &nest_imc_cpumask;
+	if (!strncmp(pmu->name, "nest_", strlen("nest_")))
+		active_mask = &nest_imc_cpumask;
+	else if (!strncmp(pmu->name, "core_", strlen("core_")))
+		active_mask = &core_imc_cpumask;
+	else
+		return 0;
 	return cpumap_print_to_pagebuf(true, buf, active_mask);
 }
 
@@ -64,6 +82,100 @@ static struct attribute_group imc_pmu_cpumask_attr_group = {
 };
 
 /*
+ * core_imc_mem_init : Initializes memory for the current core.
+ *
+ * Uses alloc_pages_exact_nid() and uses the returned address as an argument to
+ * an opal call to configure the pdbar. The address sent as an argument is
+ * converted to physical address before the opal call is made. This is the
+ * base address at which the core imc counters are populated.
+ */
+static int __meminit core_imc_mem_init(void)
+{
+	int core_id, phys_id;
+	int rc = -1;
+
+	phys_id = topology_physical_package_id(smp_processor_id());
+	core_id = smp_processor_id() / threads_per_core;
+
+	/*
+	 * alloc_pages_exact_nid() will allocate memory for core in the
+	 * local node only.
+	 */
+	per_core_pdbar_add[phys_id][core_id] = (u64) alloc_pages_exact_nid(phys_id,
+			(size_t) IMC_CORE_COUNTER_MEM, GFP_KERNEL | __GFP_ZERO);
+	rc = opal_imc_counters_init(OPAL_IMC_COUNTERS_CORE,
+				       (u64)virt_to_phys((void *)per_core_pdbar_add[phys_id][core_id]));
+
+	return rc;
+}
+
+/*
+ * Calls core_imc_mem_init and checks the return value.
+ */
+static void core_imc_init(int *cpu_opal_rc)
+{
+	int rc = 0;
+
+	rc = core_imc_mem_init();
+	if (rc)
+		cpu_opal_rc[smp_processor_id()] = 1;
+}
+
+static void core_imc_change_cpu_context(int old_cpu, int new_cpu)
+{
+	if (!core_imc_pmu)
+		return;
+	perf_pmu_migrate_context(&core_imc_pmu->pmu, old_cpu, new_cpu);
+}
+
+
+static int ppc_core_imc_cpu_online(unsigned int cpu)
+{
+	int ret;
+
+	/* If a cpu for this core is already set, then, don't do anything */
+	ret = cpumask_any_and(&core_imc_cpumask,
+				 cpu_sibling_mask(cpu));
+	if (ret < nr_cpu_ids)
+		return 0;
+
+	/* Else, set the cpu in the mask, and change the context */
+	cpumask_set_cpu(cpu, &core_imc_cpumask);
+	opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE);
+	core_imc_change_cpu_context(-1, cpu);
+	return 0;
+}
+
+static int ppc_core_imc_cpu_offline(unsigned int cpu)
+{
+	int target;
+	unsigned int ncpu;
+
+	/*
+	 * clear this cpu out of the mask, if not present in the mask,
+	 * don't bother doing anything.
+	 */
+	if (!cpumask_test_and_clear_cpu(cpu, &core_imc_cpumask))
+		return 0;
+
+	/* Find any online cpu in that core except the current "cpu" */
+	ncpu = cpumask_any_but(cpu_sibling_mask(cpu), cpu);
+
+	if (ncpu < nr_cpu_ids) {
+		target = ncpu;
+		cpumask_set_cpu(target, &core_imc_cpumask);
+	} else {
+		opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE);
+		target = -1;
+	}
+
+	/* migrate the context */
+	core_imc_change_cpu_context(cpu, target);
+
+	return 0;
+}
+
+/*
  * nest_init : Initializes the nest imc engine for the current chip.
  * by default the nest engine is disabled.
  */
@@ -195,6 +307,97 @@ static int nest_pmu_cpumask_init(void)
 	return -ENODEV;
 }
 
+static void cleanup_core_imc_memory(void)
+{
+	int phys_id, core_id;
+	u64 addr;
+
+	phys_id = topology_physical_package_id(smp_processor_id());
+	core_id = smp_processor_id() / threads_per_core;
+
+	addr = per_core_pdbar_add[phys_id][core_id];
+
+	/* Only if the address is non-zero shall, we free it */
+	if (addr)
+		free_pages(addr, 0);
+}
+
+static void cleanup_all_core_imc_memory(void)
+{
+	on_each_cpu_mask(&core_imc_cpumask,
+			 (smp_call_func_t)cleanup_core_imc_memory, NULL, 1);
+}
+
+/* Enabling of Core Engine needs a scom operation */
+static void core_imc_control_enable(void)
+{
+	opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE);
+}
+
+
+/*
+ * Disabling of IMC Core Engine needs a scom operation
+ */
+static void core_imc_control_disable(void)
+{
+	opal_imc_counters_stop(OPAL_IMC_COUNTERS_CORE);
+}
+
+/*
+ * Function to diable the IMC Core engine using core imc cpumask
+ */
+void core_imc_disable(void)
+{
+	on_each_cpu_mask(&core_imc_cpumask,
+			 (smp_call_func_t)core_imc_control_disable, NULL, 1);
+}
+
+static int core_imc_pmu_cpumask_init(void)
+{
+	int cpu, *cpus_opal_rc;
+
+	/*
+	 * Get the mask of first online cpus for every core.
+	 */
+	core_imc_cpumask = cpu_online_cores_map();
+
+	/*
+	 * Memory for OPAL call return value.
+	 */
+	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+	if (!cpus_opal_rc)
+		goto fail;
+
+	/*
+	 * Initialize the core IMC PMU on each core using the
+	 * core_imc_cpumask by calling core_imc_init().
+	 */
+	on_each_cpu_mask(&core_imc_cpumask, (smp_call_func_t)core_imc_init,
+						(void *)cpus_opal_rc, 1);
+
+	/* Check return value array for any OPAL call failure */
+	for_each_cpu(cpu, &core_imc_cpumask) {
+		if (cpus_opal_rc[cpu]) {
+			kfree(cpus_opal_rc);
+			goto fail;
+		}
+	}
+
+	kfree(cpus_opal_rc);
+
+	cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_COREIMC_ONLINE,
+			  "POWER_CORE_IMC_ONLINE",
+			  ppc_core_imc_cpu_online,
+			  ppc_core_imc_cpu_offline);
+
+	return 0;
+
+fail:
+	/* Free up the allocated pages */
+	cleanup_all_core_imc_memory();
+	return -ENODEV;
+}
+
 static int nest_imc_event_init(struct perf_event *event)
 {
 	int chip_id;
@@ -238,6 +441,44 @@ static int nest_imc_event_init(struct perf_event *event)
 	return 0;
 }
 
+static int core_imc_event_init(struct perf_event *event)
+{
+	int core_id, phys_id;
+	u64 config = event->attr.config;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Sampling not supported */
+	if (event->hw.sample_period)
+		return -EINVAL;
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest)
+		return -EINVAL;
+
+	if (event->cpu < 0)
+		return -EINVAL;
+
+	event->hw.idx = -1;
+
+	/* Sanity check for config (event offset) */
+	if (config > core_max_offset)
+		return -EINVAL;
+
+	core_id = event->cpu / threads_per_core;
+	phys_id = topology_physical_package_id(event->cpu);
+	event->hw.event_base =
+		per_core_pdbar_add[phys_id][core_id] + config;
+
+	return 0;
+}
+
 static void imc_read_counter(struct perf_event *event)
 {
 	u64 *addr, data;
@@ -384,6 +625,100 @@ static int nest_imc_event_add(struct perf_event *event, int flags)
 	return 0;
 }
 
+static int core_imc_control(int operation)
+{
+	int cpu, *cpus_opal_rc;
+
+	/*
+	 * Memory for OPAL call return value.
+	 */
+	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
+	if (!cpus_opal_rc)
+		goto fail;
+
+	/*
+	 * Initialize the core IMC PMU on each core using the
+	 * core_imc_cpumask by calling core_imc_init().
+	 */
+	switch (operation) {
+
+	case IMC_COUNTER_DISABLE:
+		on_each_cpu_mask(&core_imc_cpumask,
+				(smp_call_func_t)core_imc_control_disable,
+				(void *)cpus_opal_rc, 1);
+		break;
+	case IMC_COUNTER_ENABLE:
+		on_each_cpu_mask(&core_imc_cpumask,
+				(smp_call_func_t)core_imc_control_enable,
+				(void *)cpus_opal_rc, 1);
+		break;
+	default:
+		goto fail;
+	}
+
+	/* Check return value array for any OPAL call failure */
+	for_each_cpu(cpu, &core_imc_cpumask) {
+		if (cpus_opal_rc[cpu])
+			goto fail;
+	}
+
+	return 0;
+fail:
+	if (cpus_opal_rc)
+		kfree(cpus_opal_rc);
+	return -EINVAL;
+}
+
+
+static void core_imc_event_start(struct perf_event *event, int flags)
+{
+	int rc;
+
+	/*
+	 * Core pmu units are enabled only when it is used.
+	 * See if this is triggered for the first time.
+	 * If yes, take the mutex lock and enable the core counters.
+	 * If not, just increment the count in core_events.
+	 */
+	if (atomic_inc_return(&core_events) == 1) {
+		mutex_lock(&imc_core_reserve);
+		rc = core_imc_control(IMC_COUNTER_ENABLE);
+		mutex_unlock(&imc_core_reserve);
+		if (rc)
+			pr_err("IMC: Unbale to start the counters\n");
+	}
+	imc_event_start(event, flags);
+}
+
+static void core_imc_event_stop(struct perf_event *event, int flags)
+{
+	int rc;
+
+	imc_event_stop(event, flags);
+	/*
+	 * See if we need to disable the IMC PMU.
+	 * If no events are currently in use, then we have to take a
+	 * mutex to ensure that we don't race with another task doing
+	 * enable or disable the core counters.
+	 */
+	if (atomic_dec_return(&core_events) == 0) {
+		mutex_lock(&imc_core_reserve);
+		rc = core_imc_control(IMC_COUNTER_DISABLE);
+		mutex_unlock(&imc_core_reserve);
+		if (rc)
+			pr_err("IMC: Disable counters failed\n");
+	}
+}
+
+static int core_imc_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		core_imc_event_start(event, flags);
+
+	return 0;
+}
+
+
 /* update_pmu_ops : Populate the appropriate operations for "pmu" */
 static int update_pmu_ops(struct imc_pmu *pmu)
 {
@@ -391,13 +726,22 @@ static int update_pmu_ops(struct imc_pmu *pmu)
 		return -EINVAL;
 
 	pmu->pmu.task_ctx_nr = perf_invalid_context;
-	pmu->pmu.event_init = nest_imc_event_init;
-	pmu->pmu.add = nest_imc_event_add;
-	pmu->pmu.del = nest_imc_event_stop;
-	pmu->pmu.start = nest_imc_event_start;
-	pmu->pmu.stop = nest_imc_event_stop;
+	if (pmu->domain == IMC_DOMAIN_NEST) {
+		pmu->pmu.event_init = nest_imc_event_init;
+		pmu->pmu.add = nest_imc_event_add;
+		pmu->pmu.del = nest_imc_event_stop;
+		pmu->pmu.start = nest_imc_event_start;
+		pmu->pmu.stop = nest_imc_event_stop;
+		pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
+	} else if (pmu->domain == IMC_DOMAIN_CORE) {
+		pmu->pmu.event_init = core_imc_event_init;
+		pmu->pmu.add = core_imc_event_add;
+		pmu->pmu.del = core_imc_event_stop;
+		pmu->pmu.start = core_imc_event_start;
+		pmu->pmu.stop = core_imc_event_stop;
+		pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
+	}
 	pmu->pmu.read = imc_perf_event_update;
-	pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
 	pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
 	pmu->pmu.attr_groups = pmu->attr_groups;
 
@@ -477,9 +821,20 @@ int __init init_imc_pmu(struct imc_events *events, int idx,
 	int ret = -ENODEV;
 
 	/* Add cpumask and register for hotplug notification */
-	ret = nest_pmu_cpumask_init();
-	if (ret)
-		return ret;
+	switch (pmu_ptr->domain) {
+	case IMC_DOMAIN_NEST:
+		ret = nest_pmu_cpumask_init();
+		if (ret)
+			return ret;
+		break;
+	case IMC_DOMAIN_CORE:
+		ret = core_imc_pmu_cpumask_init();
+		if (ret)
+			return ret;
+		break;
+	default:
+		return -1;  /* Unknown domain */
+	}
 
 	ret = update_events_in_group(events, idx, pmu_ptr);
 	if (ret)
@@ -505,6 +860,9 @@ int __init init_imc_pmu(struct imc_events *events, int idx,
 			kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
 		kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
 	}
+	/* For core_imc, we have allocated memory, we need to free it */
+	if (pmu_ptr->domain == IMC_DOMAIN_CORE)
+		cleanup_all_core_imc_memory();
 
 	return ret;
 }
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index d712ef3..23507d7 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -562,6 +562,12 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
 	return -ENODEV;
 }
 
+static void opal_imc_counters_shutdown(struct platform_device *pdev)
+{
+	/* Disable the IMC Core functions */
+	core_imc_disable();
+}
+
 static const struct of_device_id opal_imc_match[] = {
 	{ .compatible = IMC_DTB_COMPAT },
 	{},
@@ -573,6 +579,7 @@ static struct platform_driver opal_imc_driver = {
 		.of_match_table = opal_imc_match,
 	},
 	.probe = opal_imc_counters_probe,
+	.shutdown = opal_imc_counters_shutdown,
 };
 
 MODULE_DEVICE_TABLE(of, opal_imc_match);
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index 51dff54..e7b7712 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -138,6 +138,7 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_ARM_L2X0_ONLINE,
 	CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
 	CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
+	CPUHP_AP_PERF_POWERPC_COREIMC_ONLINE,
 	CPUHP_AP_WORKQUEUE_ONLINE,
 	CPUHP_AP_RCUTREE_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 08/10] powerpc/powernv: Thread IMC events detection
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (6 preceding siblings ...)
  2017-05-04 14:19 ` [PATCH v8 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 09/10] powerpc/perf: Thread IMC PMU functions Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 10/10] powerpc/perf: Thread imc cpuhotplug support Anju T Sudhakar
  9 siblings, 0 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Patch adds support for detection of thread IMC events. It adds a new
domain IMC_DOMAIN_THREAD and it is determined with the help of the
compatibility string "ibm,imc-counters-thread" based on the IMC device
tree.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |  2 ++
 arch/powerpc/perf/imc-pmu.c               |  1 +
 arch/powerpc/platforms/powernv/opal-imc.c | 18 +++++++++++++++++-
 3 files changed, 20 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index bf5fb7c..6260e61 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -49,6 +49,7 @@
 #define IMC_DTB_COMPAT			"ibm,opal-in-memory-counters"
 #define IMC_DTB_NEST_COMPAT		"ibm,imc-counters-nest"
 #define IMC_DTB_CORE_COMPAT		"ibm,imc-counters-core"
+#define IMC_DTB_THREAD_COMPAT		"ibm,imc-counters-thread"
 
 /*
  * Structure to hold per chip specific memory address
@@ -98,6 +99,7 @@ struct imc_pmu {
  */
 #define IMC_DOMAIN_NEST		1
 #define IMC_DOMAIN_CORE		2
+#define IMC_DOMAIN_THREAD	3
 #define IMC_DOMAIN_UNKNOWN	-1
 
 #define IMC_COUNTER_ENABLE	1
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index fb71825..9767714 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -41,6 +41,7 @@ struct imc_pmu *core_imc_pmu;
 /* Needed for sanity check */
 extern u64 nest_max_offset;
 extern u64 core_max_offset;
+extern u64 thread_max_offset;
 
 PMU_FORMAT_ATTR(event, "config:0-20");
 static struct attribute *imc_format_attrs[] = {
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 23507d7..940f6b9 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -35,6 +35,7 @@
 
 u64 nest_max_offset;
 u64 core_max_offset;
+u64 thread_max_offset;
 
 static int imc_event_prop_update(char *name, struct imc_events *events)
 {
@@ -119,6 +120,10 @@ static void update_max_value(u32 value, int pmu_domain)
 		if (core_max_offset < value)
 			core_max_offset = value;
 		break;
+	case IMC_DOMAIN_THREAD:
+		if (thread_max_offset < value)
+			thread_max_offset = value;
+		break;
 	default:
 		/* Unknown domain, return */
 		return;
@@ -362,7 +367,7 @@ static struct imc_events *imc_events_setup(struct device_node *parent,
 /*
  * imc_pmu_create : Takes the parent device which is the pmu unit and a
  *                  pmu_index as the inputs.
- * Allocates memory for the pmu, sets up its domain (NEST/CORE), and
+ * Allocates memory for the pmu, sets up its domain (NEST/CORE/THREAD), and
  * calls imc_events_setup() to allocate memory for the events supported
  * by this pmu. Assigns a name for the pmu. Calls imc_events_node_parser()
  * to setup the individual events.
@@ -483,6 +488,17 @@ static void __init imc_pmu_setup(struct device_node *parent)
 			return;
 		pmu_count++;
 	}
+	/*
+	 * Loop through the imc-counters tree for each compatible
+	 * "ibm,imc-counters-thread", and update "struct imc_pmu".
+	 */
+	for_each_compatible_node(child, NULL, IMC_DTB_THREAD_COMPAT) {
+		domain = IMC_DOMAIN_THREAD;
+		rc = imc_pmu_create(child, pmu_count, domain);
+		if (rc)
+			return;
+		pmu_count++;
+	}
 }
 
 static int opal_imc_counters_probe(struct platform_device *pdev)
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 09/10] powerpc/perf: Thread IMC PMU functions
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (7 preceding siblings ...)
  2017-05-04 14:19 ` [PATCH v8 08/10] powerpc/powernv: Thread IMC events detection Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  2017-05-04 14:19 ` [PATCH v8 10/10] powerpc/perf: Thread imc cpuhotplug support Anju T Sudhakar
  9 siblings, 0 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

This patch adds the PMU functions required for event initialization,
read, update, add, del etc. for thread IMC PMU. Thread IMC PMUs are used
for per-task monitoring. 

For each CPU, a page of memory is allocated and is kept static i.e.,
these pages will exist till the machine shuts down. The base address of
this page is assigned to the ldbar of that cpu. As soon as we do that,
the thread IMC counters start running for that cpu and the data of these
counters are assigned to the page allocated. But we use this for
per-task monitoring. Whenever we start monitoring a task, the event is
added is onto the task. At that point, we read the initial value of the
event. Whenever, we stop monitoring the task, the final value is taken
and the difference is the event data.

Now, a task can move to a different cpu. Suppose a task X is moving from
cpu A to cpu B. When the task is scheduled out of A, we get an
event_del for A, and hence, the event data is updated. And, we stop
updating the X's event data. As soon as X moves on to B, event_add is
called for B, and we again update the event_data. And this is how it
keeps on updating the event data even when the task is scheduled on to
different cpus.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |   5 +
 arch/powerpc/perf/imc-pmu.c               | 209 +++++++++++++++++++++++++++++-
 arch/powerpc/platforms/powernv/opal-imc.c |   3 +
 3 files changed, 216 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index 6260e61..cc04712 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -42,6 +42,7 @@
  * IMC Core engine expects 8K bytes of memory for counter collection.
  */
 #define IMC_CORE_COUNTER_MEM		8192
+#define IMC_THREAD_COUNTER_MEM		8192
 
 /*
  *Compatbility macros for IMC devices
@@ -51,6 +52,9 @@
 #define IMC_DTB_CORE_COMPAT		"ibm,imc-counters-core"
 #define IMC_DTB_THREAD_COMPAT		"ibm,imc-counters-thread"
 
+#define THREAD_IMC_LDBAR_MASK           0x0003ffffffffe000
+#define THREAD_IMC_ENABLE               0x8000000000000000
+
 /*
  * Structure to hold per chip specific memory address
  * information for nest pmus. Nest Counter data are exported
@@ -110,4 +114,5 @@ extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
 extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
 void core_imc_disable(void);
+void thread_imc_disable(void);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index 9767714..cfd112e 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -38,6 +38,9 @@ static u64 per_core_pdbar_add[IMC_MAX_CHIPS][IMC_MAX_CORES];
 static cpumask_t core_imc_cpumask;
 struct imc_pmu *core_imc_pmu;
 
+/* Maintains base address for all the cpus */
+static u64 per_cpu_add[NR_CPUS];
+
 /* Needed for sanity check */
 extern u64 nest_max_offset;
 extern u64 core_max_offset;
@@ -480,6 +483,56 @@ static int core_imc_event_init(struct perf_event *event)
 	return 0;
 }
 
+static int thread_imc_event_init(struct perf_event *event)
+{
+	struct task_struct *target;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Sampling not supported */
+	if (event->hw.sample_period)
+		return -EINVAL;
+
+	event->hw.idx = -1;
+
+	/* Sanity check for config (event offset) */
+	if (event->attr.config > thread_max_offset)
+		return -EINVAL;
+
+	target = event->hw.target;
+
+	if (!target)
+		return -EINVAL;
+
+	event->pmu->task_ctx_nr = perf_sw_context;
+	return 0;
+}
+
+static void thread_imc_read_counter(struct perf_event *event)
+{
+	u64 *addr, data;
+	int cpu_id = smp_processor_id();
+
+	addr = (u64 *)(per_cpu_add[cpu_id] + event->attr.config);
+	data = __be64_to_cpu(READ_ONCE(*addr));
+	local64_set(&event->hw.prev_count, data);
+}
+
+static void thread_imc_perf_event_update(struct perf_event *event)
+{
+	u64 counter_prev, counter_new, final_count, *addr;
+	int cpu_id = smp_processor_id();
+
+	addr = (u64 *)(per_cpu_add[cpu_id] + event->attr.config);
+	counter_prev = local64_read(&event->hw.prev_count);
+	counter_new = __be64_to_cpu(READ_ONCE(*addr));
+	final_count = counter_new - counter_prev;
+
+	local64_set(&event->hw.prev_count, counter_new);
+	local64_add(final_count, &event->count);
+}
+
 static void imc_read_counter(struct perf_event *event)
 {
 	u64 *addr, data;
@@ -720,6 +773,84 @@ static int core_imc_event_add(struct perf_event *event, int flags)
 }
 
 
+static void thread_imc_event_start(struct perf_event *event, int flags)
+{
+	int rc;
+
+	/*
+	 * Core pmu units are enabled only when it is used.
+	 * See if this is triggered for the first time.
+	 * If yes, take the mutex lock and enable the core counters.
+	 * If not, just increment the count in core_events.
+	 */
+	if (atomic_inc_return(&core_events) == 1) {
+		mutex_lock(&imc_core_reserve);
+		rc = core_imc_control(IMC_COUNTER_ENABLE);
+		mutex_unlock(&imc_core_reserve);
+		if (rc)
+			pr_err("IMC: Unbale to start the counters\n");
+	}
+	thread_imc_read_counter(event);
+}
+
+static void thread_imc_event_stop(struct perf_event *event, int flags)
+{
+	int rc;
+
+	thread_imc_perf_event_update(event);
+	/*
+	 * See if we need to disable the IMC PMU.
+	 * If no events are currently in use, then we have to take a
+	 * mutex to ensure that we don't race with another task doing
+	 * enable or disable the core counters.
+	 */
+	if (atomic_dec_return(&core_events) == 0) {
+		mutex_lock(&imc_core_reserve);
+		rc = core_imc_control(IMC_COUNTER_DISABLE);
+		mutex_unlock(&imc_core_reserve);
+		if (rc)
+			pr_err("IMC: Disable counters failed\n");
+
+	}
+}
+
+static void thread_imc_event_del(struct perf_event *event, int flags)
+{
+	thread_imc_perf_event_update(event);
+}
+
+static int thread_imc_event_add(struct perf_event *event, int flags)
+{
+	thread_imc_event_start(event, flags);
+
+	return 0;
+}
+
+static void thread_imc_pmu_start_txn(struct pmu *pmu,
+				     unsigned int txn_flags)
+{
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+	perf_pmu_disable(pmu);
+}
+
+static void thread_imc_pmu_cancel_txn(struct pmu *pmu)
+{
+	perf_pmu_enable(pmu);
+}
+
+static int thread_imc_pmu_commit_txn(struct pmu *pmu)
+{
+	perf_pmu_enable(pmu);
+	return 0;
+}
+
+static void thread_imc_pmu_sched_task(struct perf_event_context *ctx,
+				  bool sched_in)
+{
+	return;
+}
+
 /* update_pmu_ops : Populate the appropriate operations for "pmu" */
 static int update_pmu_ops(struct imc_pmu *pmu)
 {
@@ -745,7 +876,26 @@ static int update_pmu_ops(struct imc_pmu *pmu)
 	pmu->pmu.read = imc_perf_event_update;
 	pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
 	pmu->pmu.attr_groups = pmu->attr_groups;
-
+	if (pmu->domain == IMC_DOMAIN_THREAD) {
+		pmu->pmu.event_init = thread_imc_event_init;
+		pmu->pmu.start = thread_imc_event_start;
+		pmu->pmu.add = thread_imc_event_add;
+		pmu->pmu.del = thread_imc_event_del;
+		pmu->pmu.stop = thread_imc_event_stop;
+		pmu->pmu.read = thread_imc_perf_event_update;
+		pmu->pmu.start_txn = thread_imc_pmu_start_txn;
+		pmu->pmu.cancel_txn = thread_imc_pmu_cancel_txn;
+		pmu->pmu.commit_txn = thread_imc_pmu_commit_txn;
+		pmu->pmu.sched_task = thread_imc_pmu_sched_task;
+
+		/*
+		 * Since thread_imc does not have any CPUMASK attr,
+		 * this may drop the "events" attr all together.
+		 * So swap the IMC_EVENT_ATTR slot with IMC_CPUMASK_ATTR.
+		 */
+		pmu->attr_groups[IMC_CPUMASK_ATTR] = pmu->attr_groups[IMC_EVENT_ATTR];
+		pmu->attr_groups[IMC_EVENT_ATTR] = NULL;
+	}
 	return 0;
 }
 
@@ -806,6 +956,56 @@ static int update_events_in_group(struct imc_events *events,
 	return 0;
 }
 
+static void thread_imc_ldbar_disable(void *dummy)
+{
+	/* LDBAR spr is a per-thread */
+	mtspr(SPRN_LDBAR, 0);
+}
+
+void thread_imc_disable(void)
+{
+	on_each_cpu(thread_imc_ldbar_disable, NULL, 1);
+}
+
+static void cleanup_thread_imc_memory(void *dummy)
+{
+	int cpu_id = smp_processor_id();
+	u64 addr = per_cpu_add[cpu_id];
+
+	/* Only if the address is non-zero, shall we free it */
+	if (addr)
+		free_pages(addr, 0);
+}
+
+static void cleanup_all_thread_imc_memory(void)
+{
+	on_each_cpu(cleanup_thread_imc_memory, NULL, 1);
+}
+
+/*
+ * Allocates a page of memory for each of the online cpus, and, writes the
+ * physical base address of that page to the LDBAR for that cpu. This starts
+ * the thread IMC counters.
+ */
+static void thread_imc_mem_alloc(void *dummy)
+{
+	u64 ldbar_addr, ldbar_value;
+	int cpu_id = smp_processor_id();
+	int phys_id = topology_physical_package_id(smp_processor_id());
+
+	per_cpu_add[cpu_id] = (u64)alloc_pages_exact_nid(phys_id,
+			(size_t)IMC_THREAD_COUNTER_MEM, GFP_KERNEL | __GFP_ZERO);
+	ldbar_addr = (u64)virt_to_phys((void *)per_cpu_add[cpu_id]);
+	ldbar_value = (ldbar_addr & (u64)THREAD_IMC_LDBAR_MASK) |
+		(u64)THREAD_IMC_ENABLE;
+	mtspr(SPRN_LDBAR, ldbar_value);
+}
+
+void thread_imc_cpu_init(void)
+{
+	on_each_cpu(thread_imc_mem_alloc, NULL, 1);
+}
+
 /*
  * init_imc_pmu : Setup and register the IMC pmu device.
  *
@@ -833,6 +1033,9 @@ int __init init_imc_pmu(struct imc_events *events, int idx,
 		if (ret)
 			return ret;
 		break;
+	case IMC_DOMAIN_THREAD:
+		thread_imc_cpu_init();
+		break;
 	default:
 		return -1;  /* Unknown domain */
 	}
@@ -865,5 +1068,9 @@ int __init init_imc_pmu(struct imc_events *events, int idx,
 	if (pmu_ptr->domain == IMC_DOMAIN_CORE)
 		cleanup_all_core_imc_memory();
 
+	/* For thread_imc, we have allocated memory, we need to free it */
+	if (pmu_ptr->domain == IMC_DOMAIN_THREAD)
+		cleanup_all_thread_imc_memory();
+
 	return ret;
 }
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 940f6b9..e36722b 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -582,6 +582,9 @@ static void opal_imc_counters_shutdown(struct platform_device *pdev)
 {
 	/* Disable the IMC Core functions */
 	core_imc_disable();
+
+	/* Disable the IMC Thread functions */
+	thread_imc_disable();
 }
 
 static const struct of_device_id opal_imc_match[] = {
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* [PATCH v8 10/10] powerpc/perf: Thread imc cpuhotplug support
  2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (8 preceding siblings ...)
  2017-05-04 14:19 ` [PATCH v8 09/10] powerpc/perf: Thread IMC PMU functions Anju T Sudhakar
@ 2017-05-04 14:19 ` Anju T Sudhakar
  9 siblings, 0 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-04 14:19 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

This patch adds support for thread IMC on cpuhotplug.

When a cpu goes offline, the LDBAR for that cpu is disabled, and when it comes
back online the previous ldbar value is written back to the LDBAR for that cpu.

To register the hotplug functions for thread_imc, a new state
CPUHP_AP_PERF_POWERPC_THREADIMC_ONLINE is added to the list of existing
states.

Reviewed-by: Gautham R. Shenoy <ego@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/perf/imc-pmu.c | 32 +++++++++++++++++++++++++++-----
 include/linux/cpuhotplug.h  |  1 +
 2 files changed, 28 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index cfd112e..f10489f 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -982,6 +982,16 @@ static void cleanup_all_thread_imc_memory(void)
 	on_each_cpu(cleanup_thread_imc_memory, NULL, 1);
 }
 
+static void thread_imc_update_ldbar(unsigned int cpu_id)
+{
+	u64 ldbar_addr, ldbar_value;
+
+	ldbar_addr = (u64)virt_to_phys((void *)per_cpu_add[cpu_id]);
+	ldbar_value = (ldbar_addr & (u64)THREAD_IMC_LDBAR_MASK) |
+			(u64)THREAD_IMC_ENABLE;
+	mtspr(SPRN_LDBAR, ldbar_value);
+}
+
 /*
  * Allocates a page of memory for each of the online cpus, and, writes the
  * physical base address of that page to the LDBAR for that cpu. This starts
@@ -989,21 +999,33 @@ static void cleanup_all_thread_imc_memory(void)
  */
 static void thread_imc_mem_alloc(void *dummy)
 {
-	u64 ldbar_addr, ldbar_value;
 	int cpu_id = smp_processor_id();
 	int phys_id = topology_physical_package_id(smp_processor_id());
 
 	per_cpu_add[cpu_id] = (u64)alloc_pages_exact_nid(phys_id,
 			(size_t)IMC_THREAD_COUNTER_MEM, GFP_KERNEL | __GFP_ZERO);
-	ldbar_addr = (u64)virt_to_phys((void *)per_cpu_add[cpu_id]);
-	ldbar_value = (ldbar_addr & (u64)THREAD_IMC_LDBAR_MASK) |
-		(u64)THREAD_IMC_ENABLE;
-	mtspr(SPRN_LDBAR, ldbar_value);
+	thread_imc_update_ldbar(cpu_id);
+}
+
+static int ppc_thread_imc_cpu_online(unsigned int cpu)
+{
+	thread_imc_update_ldbar(cpu);
+	return 0;
 }
 
+static int ppc_thread_imc_cpu_offline(unsigned int cpu)
+{
+	mtspr(SPRN_LDBAR, 0);
+	return 0;
+ }
+
 void thread_imc_cpu_init(void)
 {
 	on_each_cpu(thread_imc_mem_alloc, NULL, 1);
+	cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_THREADIMC_ONLINE,
+				"POWER_THREAD_IMC_ONLINE",
+				ppc_thread_imc_cpu_online,
+				ppc_thread_imc_cpu_offline);
 }
 
 /*
diff --git a/include/linux/cpuhotplug.h b/include/linux/cpuhotplug.h
index e7b7712..bbec927 100644
--- a/include/linux/cpuhotplug.h
+++ b/include/linux/cpuhotplug.h
@@ -139,6 +139,7 @@ enum cpuhp_state {
 	CPUHP_AP_PERF_ARM_QCOM_L2_ONLINE,
 	CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
 	CPUHP_AP_PERF_POWERPC_COREIMC_ONLINE,
+	CPUHP_AP_PERF_POWERPC_THREADIMC_ONLINE,
 	CPUHP_AP_WORKQUEUE_ONLINE,
 	CPUHP_AP_RCUTREE_ONLINE,
 	CPUHP_AP_ONLINE_DYN,
-- 
2.7.4

^ permalink raw reply related	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-04 14:19 ` [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support Anju T Sudhakar
@ 2017-05-08 14:12   ` Daniel Axtens
  2017-05-09  6:10     ` Madhavan Srinivasan
  2017-05-09 10:54     ` Anju T Sudhakar
  2017-05-10 12:09   ` Thomas Gleixner
  1 sibling, 2 replies; 26+ messages in thread
From: Daniel Axtens @ 2017-05-08 14:12 UTC (permalink / raw)
  To: Anju T Sudhakar, mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, eranian, hemant, maddy, anju

Hi all,

I've had a look at the API as it was a big thing I didn't like in the
earlier version.

I am much happier with this one.

Some comments:

 - I'm no longer subscribed to skiboot but I've had a look at the
   patches on that side:

    * in patch 9 should opal_imc_counters_init return something other
      than OPAL_SUCCESS in the case on invalid arguments? Maybe
      OPAL_PARAMETER? (I think you fix this in a later patch anyway?)

    * in start/stop, should there be some sort of write barrier to make
      sure the cb->imc_chip_command actually gets written out to memory
      at the time we expect?

The rest of my comments are in line.

> Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
> online CPU) from each chip for nest PMUs is designated to read counters.
>
> On CPU hotplug, dying CPU is checked to see whether it is one of the
> designated cpus, if yes, next online cpu from the same chip (for nest
> units) is designated as new cpu to read counters. For this purpose, we
> introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.
>
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/imc-pmu.h             |   4 +
>  arch/powerpc/include/asm/opal-api.h            |  12 +-
>  arch/powerpc/include/asm/opal.h                |   4 +
>  arch/powerpc/perf/imc-pmu.c                    | 248 ++++++++++++++++++++++++-
>  arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
>  include/linux/cpuhotplug.h                     |   1 +

Who owns this? get_maintainer.pl doesn't give me anything helpful
here... Do we need an Ack from anyone?

>  6 files changed, 266 insertions(+), 6 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
> index 6bbe184..1478d0f 100644
> --- a/arch/powerpc/include/asm/imc-pmu.h
> +++ b/arch/powerpc/include/asm/imc-pmu.h
> @@ -92,6 +92,10 @@ struct imc_pmu {
>  #define IMC_DOMAIN_NEST		1
>  #define IMC_DOMAIN_UNKNOWN	-1
>  
> +#define IMC_COUNTER_ENABLE	1
> +#define IMC_COUNTER_DISABLE	0

I'm not sure these constants are particularly useful any more, but I'll
have more to say on that later.

> +
> +
>  extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>  extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
>  extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
> index a0aa285..ce863d9 100644
> --- a/arch/powerpc/include/asm/opal-api.h
> +++ b/arch/powerpc/include/asm/opal-api.h
> @@ -168,7 +168,10 @@
>  #define OPAL_INT_SET_MFRR			125
>  #define OPAL_PCI_TCE_KILL			126
>  #define OPAL_NMMU_SET_PTCR			127
> -#define OPAL_LAST				127
> +#define OPAL_IMC_COUNTERS_INIT			149
> +#define OPAL_IMC_COUNTERS_START			150
> +#define OPAL_IMC_COUNTERS_STOP			151

Yay, this is heaps better!

> +#define OPAL_LAST				151
>  
>  /* Device tree flags */
>  
> @@ -928,6 +931,13 @@ enum {
>  	OPAL_PCI_TCE_KILL_ALL,
>  };
>  
> +/* Argument to OPAL_IMC_COUNTERS_*  */
> +enum {
> +	OPAL_IMC_COUNTERS_NEST = 1,
> +	OPAL_IMC_COUNTERS_CORE = 2,
> +	OPAL_IMC_COUNTERS_THREAD = 3,
> +};
> +
>  #endif /* __ASSEMBLY__ */
>  
>  #endif /* __OPAL_API_H */
> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
> index 1ff03a6..9c16ec6 100644
> --- a/arch/powerpc/include/asm/opal.h
> +++ b/arch/powerpc/include/asm/opal.h
> @@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
>  			  uint64_t dma_addr, uint32_t npages);
>  int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
>  
> +int64_t opal_imc_counters_init(uint32_t type, uint64_t address);
This isn't called anywhere in this patch... including (worryingly) in
the init function...

> +int64_t opal_imc_counters_start(uint32_t type);
> +int64_t opal_imc_counters_stop(uint32_t type);
> +
>  /* Internal functions */
>  extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>  				   int depth, void *data);
> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
> index f09a37a..40792424 100644
> --- a/arch/powerpc/perf/imc-pmu.c
> +++ b/arch/powerpc/perf/imc-pmu.c
> @@ -18,6 +18,11 @@
>  
>  struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>  struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
> +static cpumask_t nest_imc_cpumask;
> +
> +static atomic_t nest_events;
> +/* Used to avoid races in calling enable/disable nest-pmu units*/
You need a space here between s and * ----------------------------^

> +static DEFINE_MUTEX(imc_nest_reserve);
>  
>  /* Needed for sanity check */
>  extern u64 nest_max_offset;
> @@ -33,6 +38,160 @@ static struct attribute_group imc_format_group = {
>  	.attrs = imc_format_attrs,
>  };
>  
> +/* Get the cpumask printed to a buffer "buf" */
> +static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
> +					struct device_attribute *attr,
> +					char *buf)
> +{
> +	cpumask_t *active_mask;
> +
> +	active_mask = &nest_imc_cpumask;
> +	return cpumap_print_to_pagebuf(true, buf, active_mask);
> +}
> +
> +static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
> +
> +static struct attribute *imc_pmu_cpumask_attrs[] = {
> +	&dev_attr_cpumask.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group imc_pmu_cpumask_attr_group = {
> +	.attrs = imc_pmu_cpumask_attrs,
> +};
> +
> +/*
> + * nest_init : Initializes the nest imc engine for the current chip.
> + * by default the nest engine is disabled.
> + */
> +static void nest_init(int *cpu_opal_rc)
> +{
> +	int rc;
> +
> +	/*
> +	 * OPAL figures out which CPU to start based on the CPU that is
> +	 * currently running when we call into OPAL
> +	 */
> +	rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
Why isn't this the init call? If this is correct, a comment explaning it
would be helpful.

> +	if (rc)
> +		cpu_opal_rc[smp_processor_id()] = 1;
> +}
> +


> +static int nest_imc_control(int operation)
> +{
> +	int *cpus_opal_rc, cpu;
> +
> +	/*
> +	 * Memory for OPAL call return value.
> +	 */
> +	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
> +	if (!cpus_opal_rc)
> +		return -ENOMEM;
> +	switch (operation) {
> +
> +	case	IMC_COUNTER_ENABLE:
> +			/* Initialize Nest PMUs in each node using designated cpus */
> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_imc_start,
> +						(void *)cpus_opal_rc, 1);
> +			break;
> +	case	IMC_COUNTER_DISABLE:
> +			/* Disable the counters */
> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
> +						(void *)cpus_opal_rc, 1);
> +			break;
> +	default: return -EINVAL;
> +
> +	}
> +
> +	/* Check return value array for any OPAL call failure */
> +	for_each_cpu(cpu, &nest_imc_cpumask) {
> +		if (cpus_opal_rc[cpu])
> +			return -ENODEV;
> +	}
> +	return 0;
> +}
Two things:

 - It doesn't look like you're freeing cpus_opal_rc anywhere - have I
   missed it?

 - Would it be better to split this function into two: so instead of
   passing in `operation`, you just have a nest_imc_enable and
   nest_imc_disable? All the call sites I can see call this with a
   constant parameter anyway. Perhaps it could even be refactored into
   nest_imc_event_start/stop and this method could be removed
   entirely...

   (I haven't checked if you use this in future patches or if it gets
   expanded and makes sense to keep the function this way.)

> +
>  static void imc_event_start(struct perf_event *event, int flags)
>  {
>  	/*
> @@ -129,19 +333,44 @@ static void imc_event_stop(struct perf_event *event, int flags)
>  	imc_perf_event_update(event);
>  }
>  
> -/*
> - * The wrapper function is provided here, since we will have reserve
> - * and release lock for imc_event_start() in the following patch.
> - * Same in case of imc_event_stop().
> - */
>  static void nest_imc_event_start(struct perf_event *event, int flags)
>  {
> +	int rc;
> +
> +	/*
> +	 * Nest pmu units are enabled only when it is used.
> +	 * See if this is triggered for the first time.
> +	 * If yes, take the mutex lock and enable the nest counters.
> +	 * If not, just increment the count in nest_events.
> +	 */
> +	if (atomic_inc_return(&nest_events) == 1) {
> +		mutex_lock(&imc_nest_reserve);
> +		rc = nest_imc_control(IMC_COUNTER_ENABLE);
> +		mutex_unlock(&imc_nest_reserve);
> +		if (rc)
> +			pr_err("IMC: Unbale to start the counters\n");
Spelling: s/Unbale/Unable/ ----------^

> +	}
>  	imc_event_start(event, flags);
>  }
>

Overall I'm much happer with this now, good work :)

Regards,
Daniel

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-08 14:12   ` Daniel Axtens
@ 2017-05-09  6:10     ` Madhavan Srinivasan
  2017-05-12  2:18       ` Stewart Smith
  2017-05-09 10:54     ` Anju T Sudhakar
  1 sibling, 1 reply; 26+ messages in thread
From: Madhavan Srinivasan @ 2017-05-09  6:10 UTC (permalink / raw)
  To: Daniel Axtens, Anju T Sudhakar, mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, eranian, hemant



On Monday 08 May 2017 07:42 PM, Daniel Axtens wrote:
> Hi all,
>
> I've had a look at the API as it was a big thing I didn't like in the
> earlier version.
>
> I am much happier with this one.

Thanks to mpe for suggesting this. :)

>
> Some comments:
>
>   - I'm no longer subscribed to skiboot but I've had a look at the
>     patches on that side:

Thanks alot for the review comments.

>
>      * in patch 9 should opal_imc_counters_init return something other
>        than OPAL_SUCCESS in the case on invalid arguments? Maybe
>        OPAL_PARAMETER? (I think you fix this in a later patch anyway?)

So, init call will return OPAL_PARAMETER for the unsupported
domains (core and nest are supported). And if the init operation
fails for any reason, it would return OPAL_HARDWARE. And this is
documented.

>
>      * in start/stop, should there be some sort of write barrier to make
>        sure the cb->imc_chip_command actually gets written out to memory
>        at the time we expect?

In the current implementation we make the opal call in the
*_event_stop and *_event_start function. But we wanted to
move opal call to the corresponding *_event_init(), so this
avoid a opal call on each _event_start and _event_stop to
this pmu. With this change, we may not need the barrier.

Maddy

>
> The rest of my comments are in line.
>
>> Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
>> online CPU) from each chip for nest PMUs is designated to read counters.
>>
>> On CPU hotplug, dying CPU is checked to see whether it is one of the
>> designated cpus, if yes, next online cpu from the same chip (for nest
>> units) is designated as new cpu to read counters. For this purpose, we
>> introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.
>>
>> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
>> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/include/asm/imc-pmu.h             |   4 +
>>   arch/powerpc/include/asm/opal-api.h            |  12 +-
>>   arch/powerpc/include/asm/opal.h                |   4 +
>>   arch/powerpc/perf/imc-pmu.c                    | 248 ++++++++++++++++++++++++-
>>   arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
>>   include/linux/cpuhotplug.h                     |   1 +
> Who owns this? get_maintainer.pl doesn't give me anything helpful
> here... Do we need an Ack from anyone?
>
>>   6 files changed, 266 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
>> index 6bbe184..1478d0f 100644
>> --- a/arch/powerpc/include/asm/imc-pmu.h
>> +++ b/arch/powerpc/include/asm/imc-pmu.h
>> @@ -92,6 +92,10 @@ struct imc_pmu {
>>   #define IMC_DOMAIN_NEST		1
>>   #define IMC_DOMAIN_UNKNOWN	-1
>>   
>> +#define IMC_COUNTER_ENABLE	1
>> +#define IMC_COUNTER_DISABLE	0
> I'm not sure these constants are particularly useful any more, but I'll
> have more to say on that later.
>
>> +
>> +
>>   extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>>   extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
>>   extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
>> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>> index a0aa285..ce863d9 100644
>> --- a/arch/powerpc/include/asm/opal-api.h
>> +++ b/arch/powerpc/include/asm/opal-api.h
>> @@ -168,7 +168,10 @@
>>   #define OPAL_INT_SET_MFRR			125
>>   #define OPAL_PCI_TCE_KILL			126
>>   #define OPAL_NMMU_SET_PTCR			127
>> -#define OPAL_LAST				127
>> +#define OPAL_IMC_COUNTERS_INIT			149
>> +#define OPAL_IMC_COUNTERS_START			150
>> +#define OPAL_IMC_COUNTERS_STOP			151
> Yay, this is heaps better!
>
>> +#define OPAL_LAST				151
>>   
>>   /* Device tree flags */
>>   
>> @@ -928,6 +931,13 @@ enum {
>>   	OPAL_PCI_TCE_KILL_ALL,
>>   };
>>   
>> +/* Argument to OPAL_IMC_COUNTERS_*  */
>> +enum {
>> +	OPAL_IMC_COUNTERS_NEST = 1,
>> +	OPAL_IMC_COUNTERS_CORE = 2,
>> +	OPAL_IMC_COUNTERS_THREAD = 3,
>> +};
>> +
>>   #endif /* __ASSEMBLY__ */
>>   
>>   #endif /* __OPAL_API_H */
>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>> index 1ff03a6..9c16ec6 100644
>> --- a/arch/powerpc/include/asm/opal.h
>> +++ b/arch/powerpc/include/asm/opal.h
>> @@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
>>   			  uint64_t dma_addr, uint32_t npages);
>>   int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
>>   
>> +int64_t opal_imc_counters_init(uint32_t type, uint64_t address);
> This isn't called anywhere in this patch... including (worryingly) in
> the init function...
>
>> +int64_t opal_imc_counters_start(uint32_t type);
>> +int64_t opal_imc_counters_stop(uint32_t type);
>> +
>>   /* Internal functions */
>>   extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>>   				   int depth, void *data);
>> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
>> index f09a37a..40792424 100644
>> --- a/arch/powerpc/perf/imc-pmu.c
>> +++ b/arch/powerpc/perf/imc-pmu.c
>> @@ -18,6 +18,11 @@
>>   
>>   struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>>   struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
>> +static cpumask_t nest_imc_cpumask;
>> +
>> +static atomic_t nest_events;
>> +/* Used to avoid races in calling enable/disable nest-pmu units*/
> You need a space here between s and * ----------------------------^
>
>> +static DEFINE_MUTEX(imc_nest_reserve);
>>   
>>   /* Needed for sanity check */
>>   extern u64 nest_max_offset;
>> @@ -33,6 +38,160 @@ static struct attribute_group imc_format_group = {
>>   	.attrs = imc_format_attrs,
>>   };
>>   
>> +/* Get the cpumask printed to a buffer "buf" */
>> +static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
>> +					struct device_attribute *attr,
>> +					char *buf)
>> +{
>> +	cpumask_t *active_mask;
>> +
>> +	active_mask = &nest_imc_cpumask;
>> +	return cpumap_print_to_pagebuf(true, buf, active_mask);
>> +}
>> +
>> +static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
>> +
>> +static struct attribute *imc_pmu_cpumask_attrs[] = {
>> +	&dev_attr_cpumask.attr,
>> +	NULL,
>> +};
>> +
>> +static struct attribute_group imc_pmu_cpumask_attr_group = {
>> +	.attrs = imc_pmu_cpumask_attrs,
>> +};
>> +
>> +/*
>> + * nest_init : Initializes the nest imc engine for the current chip.
>> + * by default the nest engine is disabled.
>> + */
>> +static void nest_init(int *cpu_opal_rc)
>> +{
>> +	int rc;
>> +
>> +	/*
>> +	 * OPAL figures out which CPU to start based on the CPU that is
>> +	 * currently running when we call into OPAL
>> +	 */
>> +	rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
> Why isn't this the init call? If this is correct, a comment explaning it
> would be helpful.
>
>> +	if (rc)
>> +		cpu_opal_rc[smp_processor_id()] = 1;
>> +}
>> +
>
>> +static int nest_imc_control(int operation)
>> +{
>> +	int *cpus_opal_rc, cpu;
>> +
>> +	/*
>> +	 * Memory for OPAL call return value.
>> +	 */
>> +	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
>> +	if (!cpus_opal_rc)
>> +		return -ENOMEM;
>> +	switch (operation) {
>> +
>> +	case	IMC_COUNTER_ENABLE:
>> +			/* Initialize Nest PMUs in each node using designated cpus */
>> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_imc_start,
>> +						(void *)cpus_opal_rc, 1);
>> +			break;
>> +	case	IMC_COUNTER_DISABLE:
>> +			/* Disable the counters */
>> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
>> +						(void *)cpus_opal_rc, 1);
>> +			break;
>> +	default: return -EINVAL;
>> +
>> +	}
>> +
>> +	/* Check return value array for any OPAL call failure */
>> +	for_each_cpu(cpu, &nest_imc_cpumask) {
>> +		if (cpus_opal_rc[cpu])
>> +			return -ENODEV;
>> +	}
>> +	return 0;
>> +}
> Two things:
>
>   - It doesn't look like you're freeing cpus_opal_rc anywhere - have I
>     missed it?
>
>   - Would it be better to split this function into two: so instead of
>     passing in `operation`, you just have a nest_imc_enable and
>     nest_imc_disable? All the call sites I can see call this with a
>     constant parameter anyway. Perhaps it could even be refactored into
>     nest_imc_event_start/stop and this method could be removed
>     entirely...
>
>     (I haven't checked if you use this in future patches or if it gets
>     expanded and makes sense to keep the function this way.)
>
>> +
>>   static void imc_event_start(struct perf_event *event, int flags)
>>   {
>>   	/*
>> @@ -129,19 +333,44 @@ static void imc_event_stop(struct perf_event *event, int flags)
>>   	imc_perf_event_update(event);
>>   }
>>   
>> -/*
>> - * The wrapper function is provided here, since we will have reserve
>> - * and release lock for imc_event_start() in the following patch.
>> - * Same in case of imc_event_stop().
>> - */
>>   static void nest_imc_event_start(struct perf_event *event, int flags)
>>   {
>> +	int rc;
>> +
>> +	/*
>> +	 * Nest pmu units are enabled only when it is used.
>> +	 * See if this is triggered for the first time.
>> +	 * If yes, take the mutex lock and enable the nest counters.
>> +	 * If not, just increment the count in nest_events.
>> +	 */
>> +	if (atomic_inc_return(&nest_events) == 1) {
>> +		mutex_lock(&imc_nest_reserve);
>> +		rc = nest_imc_control(IMC_COUNTER_ENABLE);
>> +		mutex_unlock(&imc_nest_reserve);
>> +		if (rc)
>> +			pr_err("IMC: Unbale to start the counters\n");
> Spelling: s/Unbale/Unable/ ----------^
>
>> +	}
>>   	imc_event_start(event, flags);
>>   }
>>
> Overall I'm much happer with this now, good work :)
>
> Regards,
> Daniel
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-08 14:12   ` Daniel Axtens
  2017-05-09  6:10     ` Madhavan Srinivasan
@ 2017-05-09 10:54     ` Anju T Sudhakar
  1 sibling, 0 replies; 26+ messages in thread
From: Anju T Sudhakar @ 2017-05-09 10:54 UTC (permalink / raw)
  To: Daniel Axtens, mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, eranian, hemant, maddy

Hi Daniel,


On Monday 08 May 2017 07:42 PM, Daniel Axtens wrote:
> Hi all,
>
> I've had a look at the API as it was a big thing I didn't like in the
> earlier version.
>
> I am much happier with this one.
>
> Some comments:
>
>   - I'm no longer subscribed to skiboot but I've had a look at the
>     patches on that side:
>
>      * in patch 9 should opal_imc_counters_init return something other
>        than OPAL_SUCCESS in the case on invalid arguments? Maybe
>        OPAL_PARAMETER? (I think you fix this in a later patch anyway?)
>
>      * in start/stop, should there be some sort of write barrier to make
>        sure the cb->imc_chip_command actually gets written out to memory
>        at the time we expect?
>
> The rest of my comments are in line.
>
>> Adds cpumask attribute to be used by each IMC pmu. Only one cpu (any
>> online CPU) from each chip for nest PMUs is designated to read counters.
>>
>> On CPU hotplug, dying CPU is checked to see whether it is one of the
>> designated cpus, if yes, next online cpu from the same chip (for nest
>> units) is designated as new cpu to read counters. For this purpose, we
>> introduce a new state : CPUHP_AP_PERF_POWERPC_NEST_ONLINE.
>>
>> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
>> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/include/asm/imc-pmu.h             |   4 +
>>   arch/powerpc/include/asm/opal-api.h            |  12 +-
>>   arch/powerpc/include/asm/opal.h                |   4 +
>>   arch/powerpc/perf/imc-pmu.c                    | 248 ++++++++++++++++++++++++-
>>   arch/powerpc/platforms/powernv/opal-wrappers.S |   3 +
>>   include/linux/cpuhotplug.h                     |   1 +
> Who owns this? get_maintainer.pl doesn't give me anything helpful
> here... Do we need an Ack from anyone?
>
>>   6 files changed, 266 insertions(+), 6 deletions(-)
>>
>> diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
>> index 6bbe184..1478d0f 100644
>> --- a/arch/powerpc/include/asm/imc-pmu.h
>> +++ b/arch/powerpc/include/asm/imc-pmu.h
>> @@ -92,6 +92,10 @@ struct imc_pmu {
>>   #define IMC_DOMAIN_NEST		1
>>   #define IMC_DOMAIN_UNKNOWN	-1
>>   
>> +#define IMC_COUNTER_ENABLE	1
>> +#define IMC_COUNTER_DISABLE	0
> I'm not sure these constants are particularly useful any more, but I'll
> have more to say on that later.
>
>> +
>> +
>>   extern struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>>   extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
>>   extern int __init init_imc_pmu(struct imc_events *events,int idx, struct imc_pmu *pmu_ptr);
>> diff --git a/arch/powerpc/include/asm/opal-api.h b/arch/powerpc/include/asm/opal-api.h
>> index a0aa285..ce863d9 100644
>> --- a/arch/powerpc/include/asm/opal-api.h
>> +++ b/arch/powerpc/include/asm/opal-api.h
>> @@ -168,7 +168,10 @@
>>   #define OPAL_INT_SET_MFRR			125
>>   #define OPAL_PCI_TCE_KILL			126
>>   #define OPAL_NMMU_SET_PTCR			127
>> -#define OPAL_LAST				127
>> +#define OPAL_IMC_COUNTERS_INIT			149
>> +#define OPAL_IMC_COUNTERS_START			150
>> +#define OPAL_IMC_COUNTERS_STOP			151
> Yay, this is heaps better!
>
>> +#define OPAL_LAST				151
>>   
>>   /* Device tree flags */
>>   
>> @@ -928,6 +931,13 @@ enum {
>>   	OPAL_PCI_TCE_KILL_ALL,
>>   };
>>   
>> +/* Argument to OPAL_IMC_COUNTERS_*  */
>> +enum {
>> +	OPAL_IMC_COUNTERS_NEST = 1,
>> +	OPAL_IMC_COUNTERS_CORE = 2,
>> +	OPAL_IMC_COUNTERS_THREAD = 3,
>> +};
>> +
>>   #endif /* __ASSEMBLY__ */
>>   
>>   #endif /* __OPAL_API_H */
>> diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h
>> index 1ff03a6..9c16ec6 100644
>> --- a/arch/powerpc/include/asm/opal.h
>> +++ b/arch/powerpc/include/asm/opal.h
>> @@ -227,6 +227,10 @@ int64_t opal_pci_tce_kill(uint64_t phb_id, uint32_t kill_type,
>>   			  uint64_t dma_addr, uint32_t npages);
>>   int64_t opal_nmmu_set_ptcr(uint64_t chip_id, uint64_t ptcr);
>>   
>> +int64_t opal_imc_counters_init(uint32_t type, uint64_t address);
> This isn't called anywhere in this patch... including (worryingly) in
> the init function...
>
>> +int64_t opal_imc_counters_start(uint32_t type);
>> +int64_t opal_imc_counters_stop(uint32_t type);
>> +
>>   /* Internal functions */
>>   extern int early_init_dt_scan_opal(unsigned long node, const char *uname,
>>   				   int depth, void *data);
>> diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
>> index f09a37a..40792424 100644
>> --- a/arch/powerpc/perf/imc-pmu.c
>> +++ b/arch/powerpc/perf/imc-pmu.c
>> @@ -18,6 +18,11 @@
>>   
>>   struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>>   struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
>> +static cpumask_t nest_imc_cpumask;
>> +
>> +static atomic_t nest_events;
>> +/* Used to avoid races in calling enable/disable nest-pmu units*/
> You need a space here between s and * ----------------------------^

Sure. I will correct this.  :)
>
>> +static DEFINE_MUTEX(imc_nest_reserve);
>>   
>>   /* Needed for sanity check */
>>   extern u64 nest_max_offset;
>> @@ -33,6 +38,160 @@ static struct attribute_group imc_format_group = {
>>   	.attrs = imc_format_attrs,
>>   };
>>   
>> +/* Get the cpumask printed to a buffer "buf" */
>> +static ssize_t imc_pmu_cpumask_get_attr(struct device *dev,
>> +					struct device_attribute *attr,
>> +					char *buf)
>> +{
>> +	cpumask_t *active_mask;
>> +
>> +	active_mask = &nest_imc_cpumask;
>> +	return cpumap_print_to_pagebuf(true, buf, active_mask);
>> +}
>> +
>> +static DEVICE_ATTR(cpumask, S_IRUGO, imc_pmu_cpumask_get_attr, NULL);
>> +
>> +static struct attribute *imc_pmu_cpumask_attrs[] = {
>> +	&dev_attr_cpumask.attr,
>> +	NULL,
>> +};
>> +
>> +static struct attribute_group imc_pmu_cpumask_attr_group = {
>> +	.attrs = imc_pmu_cpumask_attrs,
>> +};
>> +
>> +/*
>> + * nest_init : Initializes the nest imc engine for the current chip.
>> + * by default the nest engine is disabled.
>> + */
>> +static void nest_init(int *cpu_opal_rc)
>> +{
>> +	int rc;
>> +
>> +	/*
>> +	 * OPAL figures out which CPU to start based on the CPU that is
>> +	 * currently running when we call into OPAL
>> +	 */
>> +	rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
> Why isn't this the init call? If this is correct, a comment explaning it
> would be helpful.

Basically init call is meant for any hardware initialization, that we need.
Here in case of nest, we are disabling the counters initially.
We enable them only during perf init.

I will document this here.

>> +	if (rc)
>> +		cpu_opal_rc[smp_processor_id()] = 1;
>> +}
>> +
>
>> +static int nest_imc_control(int operation)
>> +{
>> +	int *cpus_opal_rc, cpu;
>> +
>> +	/*
>> +	 * Memory for OPAL call return value.
>> +	 */
>> +	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
>> +	if (!cpus_opal_rc)
>> +		return -ENOMEM;
>> +	switch (operation) {
>> +
>> +	case	IMC_COUNTER_ENABLE:
>> +			/* Initialize Nest PMUs in each node using designated cpus */
>> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_imc_start,
>> +						(void *)cpus_opal_rc, 1);
>> +			break;
>> +	case	IMC_COUNTER_DISABLE:
>> +			/* Disable the counters */
>> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
>> +						(void *)cpus_opal_rc, 1);
>> +			break;
>> +	default: return -EINVAL;
>> +
>> +	}
>> +
>> +	/* Check return value array for any OPAL call failure */
>> +	for_each_cpu(cpu, &nest_imc_cpumask) {
>> +		if (cpus_opal_rc[cpu])
>> +			return -ENODEV;
>> +	}
>> +	return 0;
>> +}
> Two things:
>
>   - It doesn't look like you're freeing cpus_opal_rc anywhere - have I
>     missed it?

yes. we have to free cpus_opal_rc here. Will fix this.
>   - Would it be better to split this function into two: so instead of
>     passing in `operation`, you just have a nest_imc_enable and
>     nest_imc_disable? All the call sites I can see call this with a
>     constant parameter anyway. Perhaps it could even be refactored into
>     nest_imc_event_start/stop and this method could be removed
>     entirely...
>
>     (I haven't checked if you use this in future patches or if it gets
>     expanded and makes sense to keep the function this way.)
>

We can avoid some code duplication if we wrap nest-imc enable/disable into
a single function. That is why I preferred this way. I can give proper 
documentation for
this to avoid further confusion.


>> +
>>   static void imc_event_start(struct perf_event *event, int flags)
>>   {
>>   	/*
>> @@ -129,19 +333,44 @@ static void imc_event_stop(struct perf_event *event, int flags)
>>   	imc_perf_event_update(event);
>>   }
>>   
>> -/*
>> - * The wrapper function is provided here, since we will have reserve
>> - * and release lock for imc_event_start() in the following patch.
>> - * Same in case of imc_event_stop().
>> - */
>>   static void nest_imc_event_start(struct perf_event *event, int flags)
>>   {
>> +	int rc;
>> +
>> +	/*
>> +	 * Nest pmu units are enabled only when it is used.
>> +	 * See if this is triggered for the first time.
>> +	 * If yes, take the mutex lock and enable the nest counters.
>> +	 * If not, just increment the count in nest_events.
>> +	 */
>> +	if (atomic_inc_return(&nest_events) == 1) {
>> +		mutex_lock(&imc_nest_reserve);
>> +		rc = nest_imc_control(IMC_COUNTER_ENABLE);
>> +		mutex_unlock(&imc_nest_reserve);
>> +		if (rc)
>> +			pr_err("IMC: Unbale to start the counters\n");
> Spelling: s/Unbale/Unable/ ----------^
Will correct here. Thanks for mentioning. :-)

>> +	}
>>   	imc_event_start(event, flags);
>>   }
>>
> Overall I'm much happer with this now, good work :)
>
> Regards,
> Daniel
>



Thanks and Regards,
-Anju

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-04 14:19 ` [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support Anju T Sudhakar
  2017-05-08 14:12   ` Daniel Axtens
@ 2017-05-10 12:09   ` Thomas Gleixner
  2017-05-10 23:40     ` Stephen Rothwell
  2017-05-15 10:12     ` Madhavan Srinivasan
  1 sibling, 2 replies; 26+ messages in thread
From: Thomas Gleixner @ 2017-05-10 12:09 UTC (permalink / raw)
  To: Anju T Sudhakar
  Cc: mpe, LKML, linuxppc-dev, ego, bsingharora, Anton Blanchard,
	sukadev, mikey, stewart, dja, eranian, hemant, maddy,
	Peter Zijlstra

On Thu, 4 May 2017, Anju T Sudhakar wrote:
> +/*
> + * nest_init : Initializes the nest imc engine for the current chip.
> + * by default the nest engine is disabled.
> + */
> +static void nest_init(int *cpu_opal_rc)
> +{
> +	int rc;
> +
> +	/*
> +	 * OPAL figures out which CPU to start based on the CPU that is
> +	 * currently running when we call into OPAL

I have no idea what that comment tries to tell me and how it is related to
the init function or the invoked opal_imc_counters_stop() function.

> +	 */
> +	rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
> +	if (rc)
> +		cpu_opal_rc[smp_processor_id()] = 1;
> +}
> +
> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> +{
> +	int i;
> +
> +	for (i = 0;
> +	     (per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
> +		perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> +							old_cpu, new_cpu);

Bah, this is horrible to read.

	struct imc_pmu **pn = per_nest_pmu_arr;
	int i;

	for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
		perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

Hmm?

> +}

> +
> +static int ppc_nest_imc_cpu_online(unsigned int cpu)
> +{
> +	int nid;
> +	const struct cpumask *l_cpumask;
> +	struct cpumask tmp_mask;

You should not allocate cpumask on stack unconditionally. Either make that
cpumask_var_t and use zalloc/free_cpumask_var() or simply make it

	static struct cpumask tmp_mask;

That's fine, because this is serialized by the hotplug code already.

> +
> +	/* Find the cpumask of this node */
> +	nid = cpu_to_node(cpu);
> +	l_cpumask = cpumask_of_node(nid);
> +
> +	/*
> +	 * If any of the cpu from this node is already present in the mask,
> +	 * just return, if not, then set this cpu in the mask.
> +	 */
> +	if (!cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask)) {
> +		cpumask_set_cpu(cpu, &nest_imc_cpumask);
> +		nest_change_cpu_context(-1, cpu);
> +		return 0;
> +	}
> +
> +	return 0;
> +}
> +
> +static int ppc_nest_imc_cpu_offline(unsigned int cpu)
> +{
> +	int nid, target = -1;
> +	const struct cpumask *l_cpumask;
> +
> +	/*
> +	 * Check in the designated list for this cpu. Dont bother
> +	 * if not one of them.
> +	 */
> +	if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
> +		return 0;
> +
> +	/*
> +	 * Now that this cpu is one of the designated,
> +	 * find a next cpu a) which is online and b) in same chip.
> +	 */
> +	nid = cpu_to_node(cpu);
> +	l_cpumask = cpumask_of_node(nid);
> +	target = cpumask_next(cpu, l_cpumask);
> +
> +	/*
> +	 * Update the cpumask with the target cpu and
> +	 * migrate the context if needed
> +	 */
> +	if (target >= 0 && target <= nr_cpu_ids) {
> +		cpumask_set_cpu(target, &nest_imc_cpumask);
> +		nest_change_cpu_context(cpu, target);
> +	}

What disables the perf context if this was the last CPU on the node?

> +	return 0;
> +}
> +
> +static int nest_pmu_cpumask_init(void)
> +{
> +	const struct cpumask *l_cpumask;
> +	int cpu, nid;
> +	int *cpus_opal_rc;
> +
> +	if (!cpumask_empty(&nest_imc_cpumask))
> +		return 0;

What's that for? Paranoia engineering?

> +
> +	/*
> +	 * Memory for OPAL call return value.
> +	 */
> +	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
> +	if (!cpus_opal_rc)
> +		goto fail;
> +
> +	/*
> +	 * Nest PMUs are per-chip counters. So designate a cpu
> +	 * from each chip for counter collection.
> +	 */
> +	for_each_online_node(nid) {
> +		l_cpumask = cpumask_of_node(nid);
> +
> +		/* designate first online cpu in this node */
> +		cpu = cpumask_first(l_cpumask);
> +		cpumask_set_cpu(cpu, &nest_imc_cpumask);
> +	}

This is all unprotected against CPU hotplug.

> +
> +	/* Initialize Nest PMUs in each node using designated cpus */
> +	on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
> +						(void *)cpus_opal_rc, 1);

What does this check on nodes which are not yet online and become online
later?

> +	/* Check return value array for any OPAL call failure */
> +	for_each_cpu(cpu, &nest_imc_cpumask) {

Races against CPU hotplug as well.

> +		if (cpus_opal_rc[cpu])
> +			goto fail;
> +	}

Leaks cpus_opal_rc.

> +
> +	cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
> +			  "POWER_NEST_IMC_ONLINE",

Please use a proper prefixed text here. 

> +			  ppc_nest_imc_cpu_online,
> +			  ppc_nest_imc_cpu_offline);
> +
> +	return 0;
> +
> +fail:
> +	if (cpus_opal_rc)
> +		kfree(cpus_opal_rc);
> +	return -ENODEV;

But this whole function is completely overengineered. If you make that
nest_init() part of the online function then this works also for nodes
which come online later and it simplifies to:

static int ppc_nest_imc_cpu_online(unsigned int cpu)
{
	const struct cpumask *l_cpumask;
	static struct cpumask tmp_mask;
	int res;

	/* Get the cpumask of this node */
	l_cpumask = cpumask_of_node(cpu_to_node(cpu));

	/*
	 * If this is not the first online CPU on this node, then IMC is
	 * initialized already.
	 */
	if (cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask))
		return 0;

	/*
	 * If this fails, IMC is not usable.
	 *
	 * FIXME: Add a understandable comment what this actually does
	 * and why it can fail.
	 */
	res = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
	if (res)
		return res;

	/* Make this CPU the designated target for counter collection */
	cpumask_set_cpu(cpu, &nest_imc_cpumask);
	nest_change_cpu_context(-1, cpu);
	return 0;
}

static int nest_pmu_cpumask_init(void)
{
	return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
				 "perf/powerpc/imc:online",
				 ppc_nest_imc_cpu_online,
				 ppc_nest_imc_cpu_offline);
}

Hmm?

> +static void nest_imc_start(int *cpu_opal_rc)
> +{
> +	int rc;
> +
> +	/* Enable nest engine */
> +	rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_NEST);
> +	if (rc)
> +		cpu_opal_rc[smp_processor_id()] = 1;
> +
> +}
> +
> +static int nest_imc_control(int operation)
> +{
> +	int *cpus_opal_rc, cpu;
> +
> +	/*
> +	 * Memory for OPAL call return value.
> +	 */
> +	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
> +	if (!cpus_opal_rc)
> +		return -ENOMEM;

This function leaks cpus_opal_rc on each invocation. Great stuff!

> +	switch (operation) {
> +
> +	case	IMC_COUNTER_ENABLE:
> +			/* Initialize Nest PMUs in each node using designated cpus */
> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_imc_start,

How is that supposed to work? This is called from interrupt disabled
context. on_each_cpu_mask() must not be called from there. And in the worst
case this is called not only from interrupt disabled context but from a smp
function call .....

Aside of that. What's the point of these type casts? If your function does
not have the signature of a smp function call function, then you should fix
that. If it has, then the type cast is just crap.

> +						(void *)cpus_opal_rc, 1);

Ditto for this. Casting to (void *) is pointless.

> +			break;
> +	case	IMC_COUNTER_DISABLE:
> +			/* Disable the counters */
> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
> +						(void *)cpus_opal_rc, 1);
> +			break;
> +	default: return -EINVAL;
> +
> +	}
> +
> +	/* Check return value array for any OPAL call failure */
> +	for_each_cpu(cpu, &nest_imc_cpumask) {

> +		if (cpus_opal_rc[cpu])
> +			return -ENODEV;

So this just checks whether the disable/enable was successful, but what's
the consequence? Counters stay half enabled or disabled depending on which
of the CPUs failed.

This whole result array dance is just useless as you are solely printing
stuff at the call site. So I assume those calls are not supposed to fail,
so you can do the print in the function itself and get rid of all this
cpus_opal_rc hackery. Though that's the least of your worries, see above.

> +	}
> +	return 0;
> +}
> +
>  static void imc_event_start(struct perf_event *event, int flags)
>  {
>  	/*
> @@ -129,19 +333,44 @@ static void imc_event_stop(struct perf_event *event, int flags)
>  	imc_perf_event_update(event);
>  }
>  
> -/*
> - * The wrapper function is provided here, since we will have reserve
> - * and release lock for imc_event_start() in the following patch.
> - * Same in case of imc_event_stop().
> - */
>  static void nest_imc_event_start(struct perf_event *event, int flags)
>  {
> +	int rc;
> +
> +	/*
> +	 * Nest pmu units are enabled only when it is used.
> +	 * See if this is triggered for the first time.
> +	 * If yes, take the mutex lock and enable the nest counters.
> +	 * If not, just increment the count in nest_events.
> +	 */
> +	if (atomic_inc_return(&nest_events) == 1) {
> +		mutex_lock(&imc_nest_reserve);

How is that supposed to work? pmu->start() and pmu->stop() are called with
interrupts disabled. Locking a mutex there is a NONO.

> +		rc = nest_imc_control(IMC_COUNTER_ENABLE);
> +		mutex_unlock(&imc_nest_reserve);
> +		if (rc)
> +			pr_err("IMC: Unbale to start the counters\n");
> +	}
>  	imc_event_start(event, flags);
>  }
>  
>  static void nest_imc_event_stop(struct perf_event *event, int flags)
>  {
> +	int rc;
> +
>  	imc_event_stop(event, flags);
> +	/*
> +	 * See if we need to disable the nest PMU.
> +	 * If no events are currently in use, then we have to take a
> +	 * mutex to ensure that we don't race with another task doing
> +	 * enable or disable the nest counters.
> +	 */
> +	if (atomic_dec_return(&nest_events) == 0) {
> +		mutex_lock(&imc_nest_reserve);

See above.

> +		rc = nest_imc_control(IMC_COUNTER_DISABLE);
> +		mutex_unlock(&imc_nest_reserve);
> +		if (rc)
> +			pr_err("IMC: Disable counters failed\n");
> +	}
>  }

I have no idea how that survived any form of testing ....

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-10 12:09   ` Thomas Gleixner
@ 2017-05-10 23:40     ` Stephen Rothwell
  2017-05-11  8:39       ` Thomas Gleixner
  2017-05-15 10:12     ` Madhavan Srinivasan
  1 sibling, 1 reply; 26+ messages in thread
From: Stephen Rothwell @ 2017-05-10 23:40 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Anju T Sudhakar, stewart, ego, mikey, maddy, Peter Zijlstra,
	LKML, eranian, hemant, Anton Blanchard, sukadev, linuxppc-dev,
	dja

Hi,

On Wed, 10 May 2017 14:09:53 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote:
>
> > +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> > +{
> > +	int i;
> > +
> > +	for (i = 0;
> > +	     (per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
> > +		perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> > +							old_cpu, new_cpu);  
> 
> Bah, this is horrible to read.
> 
> 	struct imc_pmu **pn = per_nest_pmu_arr;
> 	int i;
> 
> 	for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
> 		perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

(Just a bit of bike shedding ...)

Or even (since "i" is not used any more):

	struct imc_pmu **pn;

	for (pn = per_nest_pmu_arr;
	     pn < &per_nest_pmu_arr[IMC_MAX_PMUS] && *pn;
	     pn++)
		perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

-- 
Cheers,
Stephen Rothwell

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 02/10] powerpc/powernv: Autoload IMC device driver module
  2017-05-04 14:19 ` [PATCH v8 02/10] powerpc/powernv: Autoload IMC device driver module Anju T Sudhakar
@ 2017-05-11  7:49   ` Stewart Smith
  2017-05-12  4:36     ` Madhavan Srinivasan
  0 siblings, 1 reply; 26+ messages in thread
From: Stewart Smith @ 2017-05-11  7:49 UTC (permalink / raw)
  To: Anju T Sudhakar, mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, dja, eranian, hemant, maddy, anju

Anju T Sudhakar <anju@linux.vnet.ibm.com> writes:
> This patch does three things :
>  - Enables "opal.c" to create a platform device for the IMC interface
>    according to the appropriate compatibility string.
>  - Find the reserved-memory region details from the system device tree
>    and get the base address of HOMER (Reserved memory) region address for each chip.
>  - We also get the Nest PMU counter data offsets (in the HOMER region)
>    and their sizes. The offsets for the counters' data are fixed and
>    won't change from chip to chip.
>
> The device tree parsing logic is separated from the PMU creation
> functions (which is done in subsequent patches).
>
> Patch also adds a CONFIG_HV_PERF_IMC_CTRS for the IMC driver.
>
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
> ---
>  arch/powerpc/platforms/powernv/Kconfig    |  10 +++
>  arch/powerpc/platforms/powernv/Makefile   |   1 +
>  arch/powerpc/platforms/powernv/opal-imc.c | 140 ++++++++++++++++++++++++++++++
>  arch/powerpc/platforms/powernv/opal.c     |  18 ++++
>  4 files changed, 169 insertions(+)
>  create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c
>
> diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
> index 3a07e4d..1b90a98 100644
> --- a/arch/powerpc/platforms/powernv/Kconfig
> +++ b/arch/powerpc/platforms/powernv/Kconfig
> @@ -27,3 +27,13 @@ config OPAL_PRD
>  	help
>  	  This enables the opal-prd driver, a facility to run processor
>  	  recovery diagnostics on OpenPower machines
> +
> +config HV_PERF_IMC_CTRS
> +       bool "Hypervisor supplied In Memory Collection PMU events (Nest & Core)"
> +       default y
> +       depends on PERF_EVENTS && PPC_POWERNV
> +       help
> +	  Enable access to hypervisor supplied in-memory collection counters
> +	  in perf. IMC counters are available from Power9 systems.
> +
> +          If unsure, select Y.
> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
> index b5d98cb..715e531 100644
> --- a/arch/powerpc/platforms/powernv/Makefile
> +++ b/arch/powerpc/platforms/powernv/Makefile
> @@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
>  obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
>  obj-$(CONFIG_TRACEPOINTS)	+= opal-tracepoints.o
>  obj-$(CONFIG_OPAL_PRD)	+= opal-prd.o
> +obj-$(CONFIG_HV_PERF_IMC_CTRS) += opal-imc.o
> diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
> new file mode 100644
> index 0000000..3a87000
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-imc.c
> @@ -0,0 +1,140 @@
> +/*
> + * OPAL IMC interface detection driver
> + * Supported on POWERNV platform
> + *
> + * Copyright	(C) 2017 Madhavan Srinivasan, IBM Corporation.
> + *		(C) 2017 Anju T Sudhakar, IBM Corporation.
> + *		(C) 2017 Hemant K Shaw, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or modify
> + * it under the terms of the GNU General Public License version 2 as
> + * published by the Free Software Foundation.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.
> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/miscdevice.h>
> +#include <linux/fs.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/of_platform.h>
> +#include <linux/poll.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/crash_dump.h>
> +#include <asm/opal.h>
> +#include <asm/io.h>
> +#include <asm/uaccess.h>
> +#include <asm/cputable.h>
> +#include <asm/imc-pmu.h>
> +
> +struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
> +
> +/*
> + * imc_pmu_setup : Setup the IMC PMUs (children of "parent").
> + */
> +static void __init imc_pmu_setup(struct device_node *parent)
> +{
> +	if (!parent)
> +		return;
> +}
> +
> +static int opal_imc_counters_probe(struct platform_device *pdev)
> +{
> +	struct device_node *imc_dev, *dn, *rm_node = NULL;
> +	struct perchip_nest_info *pcni;
> +	u32 pages, nest_offset, nest_size, chip_id;
> +	int i = 0;
> +	const __be32 *addrp;
> +	u64 reg_addr, reg_size;
> +
> +	if (!pdev || !pdev->dev.of_node)
> +		return -ENODEV;
> +
> +	/*
> +	 * Check whether this is kdump kernel. If yes, just return.
> +	 */
> +	if (is_kdump_kernel())
> +		return -ENODEV;
> +
> +	imc_dev = pdev->dev.of_node;
> +
> +	/*
> +	 * Nest counter data are saved in a reserved memory called HOMER.
> +	 * "imc-nest-offset" identifies the counter data location within HOMER.
> +	 * size : size of the entire nest-counters region
> +	 */
> +	if (of_property_read_u32(imc_dev, "imc-nest-offset", &nest_offset))
> +		goto err;
> +
> +	if (of_property_read_u32(imc_dev, "imc-nest-size", &nest_size))
> +		goto err;
> +
> +	/* Sanity check */
> +	if ((nest_size/PAGE_SIZE) > IMC_NEST_MAX_PAGES)
> +		goto err;
> +
> +	/* Find the "HOMER region" for each chip */
> +	rm_node = of_find_node_by_path("/reserved-memory");
> +	if (!rm_node)
> +		goto err;
> +
> +	/*
> +	 * We need to look for the "ibm,homer-image" node in the
> +	 * "/reserved-memory" node.
> +	 */
> +	for (dn = of_find_node_by_name(rm_node, "ibm,homer-image"); dn;
> +			dn = of_find_node_by_name(dn, "ibm,homer-image")) {
> +
> +		/* Get the chip id to which the above homer region belongs to */
> +		if (of_property_read_u32(dn, "ibm,chip-id", &chip_id))
> +			goto err;

So, I was thinking on this (and should probably comment on the firmware
side as well).

I'd prefer an OPAL interface where instead of looking up where
ibm,homer-image is, we provide the kernel with a base address and then
have offsets into it.

That way, we don't tie the kernel code to counters that are only in the
HOMER region.


-- 
Stewart Smith
OPAL Architect, IBM.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-10 23:40     ` Stephen Rothwell
@ 2017-05-11  8:39       ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2017-05-11  8:39 UTC (permalink / raw)
  To: Stephen Rothwell
  Cc: Anju T Sudhakar, stewart, ego, mikey, maddy, Peter Zijlstra,
	LKML, eranian, hemant, Anton Blanchard, sukadev, linuxppc-dev,
	dja

On Thu, 11 May 2017, Stephen Rothwell wrote:

> Hi,
> 
> On Wed, 10 May 2017 14:09:53 +0200 (CEST) Thomas Gleixner <tglx@linutronix.de> wrote:
> >
> > > +static void nest_change_cpu_context(int old_cpu, int new_cpu)
> > > +{
> > > +	int i;
> > > +
> > > +	for (i = 0;
> > > +	     (per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
> > > +		perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
> > > +							old_cpu, new_cpu);  
> > 
> > Bah, this is horrible to read.
> > 
> > 	struct imc_pmu **pn = per_nest_pmu_arr;
> > 	int i;
> > 
> > 	for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
> > 		perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);
> 
> (Just a bit of bike shedding ...)
> 
> Or even (since "i" is not used any more):
> 
> 	struct imc_pmu **pn;
> 
> 	for (pn = per_nest_pmu_arr;
> 	     pn < &per_nest_pmu_arr[IMC_MAX_PMUS] && *pn;
> 	     pn++)
> 		perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);

Which is equally unreadable as the original code I complained about. Is that
a corporate preference?

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-09  6:10     ` Madhavan Srinivasan
@ 2017-05-12  2:18       ` Stewart Smith
  2017-05-12  3:33         ` Michael Ellerman
  2017-05-12  3:41         ` Madhavan Srinivasan
  0 siblings, 2 replies; 26+ messages in thread
From: Stewart Smith @ 2017-05-12  2:18 UTC (permalink / raw)
  To: Madhavan Srinivasan, Daniel Axtens, Anju T Sudhakar, mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, eranian, hemant

Madhavan Srinivasan <maddy@linux.vnet.ibm.com> writes:
>>      * in patch 9 should opal_imc_counters_init return something other
>>        than OPAL_SUCCESS in the case on invalid arguments? Maybe
>>        OPAL_PARAMETER? (I think you fix this in a later patch anyway?)
>
> So, init call will return OPAL_PARAMETER for the unsupported
> domains (core and nest are supported). And if the init operation
> fails for any reason, it would return OPAL_HARDWARE. And this is
> documented.

(I'll comment on the skiboot one too), but I think that if the class
exists but init is a no-op, then OPAL_IMC_COUNTERS_INIT should return
OPAL_SUCCESS and just do nothing. This future proofs everything, and the
API is that one *must* call _INIT before start.


-- 
Stewart Smith
OPAL Architect, IBM.

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-12  2:18       ` Stewart Smith
@ 2017-05-12  3:33         ` Michael Ellerman
  2017-05-12  3:50           ` Madhavan Srinivasan
  2017-05-12  3:41         ` Madhavan Srinivasan
  1 sibling, 1 reply; 26+ messages in thread
From: Michael Ellerman @ 2017-05-12  3:33 UTC (permalink / raw)
  To: Stewart Smith, Madhavan Srinivasan, Daniel Axtens, Anju T Sudhakar
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, eranian, hemant

Stewart Smith <stewart@linux.vnet.ibm.com> writes:

> Madhavan Srinivasan <maddy@linux.vnet.ibm.com> writes:
>>>      * in patch 9 should opal_imc_counters_init return something other
>>>        than OPAL_SUCCESS in the case on invalid arguments? Maybe
>>>        OPAL_PARAMETER? (I think you fix this in a later patch anyway?)
>>
>> So, init call will return OPAL_PARAMETER for the unsupported
>> domains (core and nest are supported). And if the init operation
>> fails for any reason, it would return OPAL_HARDWARE. And this is
>> documented.
>
> (I'll comment on the skiboot one too), but I think that if the class
> exists but init is a no-op, then OPAL_IMC_COUNTERS_INIT should return
> OPAL_SUCCESS and just do nothing. This future proofs everything, and the
> API is that one *must* call _INIT before start.

Yes, 100%.

That's what I described in my replies to a previous version, if it
doesn't do that we need to fix it.

cheers

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-12  2:18       ` Stewart Smith
  2017-05-12  3:33         ` Michael Ellerman
@ 2017-05-12  3:41         ` Madhavan Srinivasan
  1 sibling, 0 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2017-05-12  3:41 UTC (permalink / raw)
  To: Stewart Smith, Daniel Axtens, Anju T Sudhakar, mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, eranian, hemant



On Friday 12 May 2017 07:48 AM, Stewart Smith wrote:
> Madhavan Srinivasan <maddy@linux.vnet.ibm.com> writes:
>>>       * in patch 9 should opal_imc_counters_init return something other
>>>         than OPAL_SUCCESS in the case on invalid arguments? Maybe
>>>         OPAL_PARAMETER? (I think you fix this in a later patch anyway?)
>> So, init call will return OPAL_PARAMETER for the unsupported
>> domains (core and nest are supported). And if the init operation
>> fails for any reason, it would return OPAL_HARDWARE. And this is
>> documented.
> (I'll comment on the skiboot one too), but I think that if the class
> exists but init is a no-op, then OPAL_IMC_COUNTERS_INIT should return
> OPAL_SUCCESS and just do nothing. This future proofs everything, and the
> API is that one *must* call _INIT before start.
Hi stewart,

Yes. mpe did mention this in his review. And i have made the same in the v11
of the opal patchset. Currently we return OPAL_SUCCESS from _INIT
incase of type "Nest". Additionally i have also added a message to be 
printed
  but i guess we can get away with that.

Maddy

>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-12  3:33         ` Michael Ellerman
@ 2017-05-12  3:50           ` Madhavan Srinivasan
  0 siblings, 0 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2017-05-12  3:50 UTC (permalink / raw)
  To: Michael Ellerman, Stewart Smith, Daniel Axtens, Anju T Sudhakar
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, eranian, hemant



On Friday 12 May 2017 09:03 AM, Michael Ellerman wrote:
> Stewart Smith <stewart@linux.vnet.ibm.com> writes:
>
>> Madhavan Srinivasan <maddy@linux.vnet.ibm.com> writes:
>>>>       * in patch 9 should opal_imc_counters_init return something other
>>>>         than OPAL_SUCCESS in the case on invalid arguments? Maybe
>>>>         OPAL_PARAMETER? (I think you fix this in a later patch anyway?)
>>> So, init call will return OPAL_PARAMETER for the unsupported
>>> domains (core and nest are supported). And if the init operation
>>> fails for any reason, it would return OPAL_HARDWARE. And this is
>>> documented.
>> (I'll comment on the skiboot one too), but I think that if the class
>> exists but init is a no-op, then OPAL_IMC_COUNTERS_INIT should return
>> OPAL_SUCCESS and just do nothing. This future proofs everything, and the
>> API is that one *must* call _INIT before start.
> Yes, 100%.
>
> That's what I described in my replies to a previous version, if it
> doesn't do that we need to fix it.
Hi mpe,

Yes, as you suggested in the opal v11 patchset, we return OPAL_SUCCESS
from _INIT for type "Nest". Have also added a prerror message logging
for debug, but can get away with it or make it as a prlog.

Maddy
>
> cheers
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 02/10] powerpc/powernv: Autoload IMC device driver module
  2017-05-11  7:49   ` Stewart Smith
@ 2017-05-12  4:36     ` Madhavan Srinivasan
  0 siblings, 0 replies; 26+ messages in thread
From: Madhavan Srinivasan @ 2017-05-12  4:36 UTC (permalink / raw)
  To: Stewart Smith, Anju T Sudhakar, mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, dja, eranian, hemant



On Thursday 11 May 2017 01:19 PM, Stewart Smith wrote:
> Anju T Sudhakar <anju@linux.vnet.ibm.com> writes:
>> This patch does three things :
>>   - Enables "opal.c" to create a platform device for the IMC interface
>>     according to the appropriate compatibility string.
>>   - Find the reserved-memory region details from the system device tree
>>     and get the base address of HOMER (Reserved memory) region address for each chip.
>>   - We also get the Nest PMU counter data offsets (in the HOMER region)
>>     and their sizes. The offsets for the counters' data are fixed and
>>     won't change from chip to chip.
>>
>> The device tree parsing logic is separated from the PMU creation
>> functions (which is done in subsequent patches).
>>
>> Patch also adds a CONFIG_HV_PERF_IMC_CTRS for the IMC driver.
>>
>> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
>> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
>> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/platforms/powernv/Kconfig    |  10 +++
>>   arch/powerpc/platforms/powernv/Makefile   |   1 +
>>   arch/powerpc/platforms/powernv/opal-imc.c | 140 ++++++++++++++++++++++++++++++
>>   arch/powerpc/platforms/powernv/opal.c     |  18 ++++
>>   4 files changed, 169 insertions(+)
>>   create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c
>>
>> diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
>> index 3a07e4d..1b90a98 100644
>> --- a/arch/powerpc/platforms/powernv/Kconfig
>> +++ b/arch/powerpc/platforms/powernv/Kconfig
>> @@ -27,3 +27,13 @@ config OPAL_PRD
>>   	help
>>   	  This enables the opal-prd driver, a facility to run processor
>>   	  recovery diagnostics on OpenPower machines
>> +
>> +config HV_PERF_IMC_CTRS
>> +       bool "Hypervisor supplied In Memory Collection PMU events (Nest & Core)"
>> +       default y
>> +       depends on PERF_EVENTS && PPC_POWERNV
>> +       help
>> +	  Enable access to hypervisor supplied in-memory collection counters
>> +	  in perf. IMC counters are available from Power9 systems.
>> +
>> +          If unsure, select Y.
>> diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
>> index b5d98cb..715e531 100644
>> --- a/arch/powerpc/platforms/powernv/Makefile
>> +++ b/arch/powerpc/platforms/powernv/Makefile
>> @@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
>>   obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
>>   obj-$(CONFIG_TRACEPOINTS)	+= opal-tracepoints.o
>>   obj-$(CONFIG_OPAL_PRD)	+= opal-prd.o
>> +obj-$(CONFIG_HV_PERF_IMC_CTRS) += opal-imc.o
>> diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
>> new file mode 100644
>> index 0000000..3a87000
>> --- /dev/null
>> +++ b/arch/powerpc/platforms/powernv/opal-imc.c
>> @@ -0,0 +1,140 @@
>> +/*
>> + * OPAL IMC interface detection driver
>> + * Supported on POWERNV platform
>> + *
>> + * Copyright	(C) 2017 Madhavan Srinivasan, IBM Corporation.
>> + *		(C) 2017 Anju T Sudhakar, IBM Corporation.
>> + *		(C) 2017 Hemant K Shaw, IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or modify
>> + * it under the terms of the GNU General Public License version 2 as
>> + * published by the Free Software Foundation.
>> + *
>> + * This program is distributed in the hope that it will be useful,
>> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
>> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> + * GNU General Public License for more details.
>> + */
>> +#include <linux/kernel.h>
>> +#include <linux/module.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/miscdevice.h>
>> +#include <linux/fs.h>
>> +#include <linux/of.h>
>> +#include <linux/of_address.h>
>> +#include <linux/of_platform.h>
>> +#include <linux/poll.h>
>> +#include <linux/mm.h>
>> +#include <linux/slab.h>
>> +#include <linux/crash_dump.h>
>> +#include <asm/opal.h>
>> +#include <asm/io.h>
>> +#include <asm/uaccess.h>
>> +#include <asm/cputable.h>
>> +#include <asm/imc-pmu.h>
>> +
>> +struct perchip_nest_info nest_perchip_info[IMC_MAX_CHIPS];
>> +
>> +/*
>> + * imc_pmu_setup : Setup the IMC PMUs (children of "parent").
>> + */
>> +static void __init imc_pmu_setup(struct device_node *parent)
>> +{
>> +	if (!parent)
>> +		return;
>> +}
>> +
>> +static int opal_imc_counters_probe(struct platform_device *pdev)
>> +{
>> +	struct device_node *imc_dev, *dn, *rm_node = NULL;
>> +	struct perchip_nest_info *pcni;
>> +	u32 pages, nest_offset, nest_size, chip_id;
>> +	int i = 0;
>> +	const __be32 *addrp;
>> +	u64 reg_addr, reg_size;
>> +
>> +	if (!pdev || !pdev->dev.of_node)
>> +		return -ENODEV;
>> +
>> +	/*
>> +	 * Check whether this is kdump kernel. If yes, just return.
>> +	 */
>> +	if (is_kdump_kernel())
>> +		return -ENODEV;
>> +
>> +	imc_dev = pdev->dev.of_node;
>> +
>> +	/*
>> +	 * Nest counter data are saved in a reserved memory called HOMER.
>> +	 * "imc-nest-offset" identifies the counter data location within HOMER.
>> +	 * size : size of the entire nest-counters region
>> +	 */
>> +	if (of_property_read_u32(imc_dev, "imc-nest-offset", &nest_offset))
>> +		goto err;
>> +
>> +	if (of_property_read_u32(imc_dev, "imc-nest-size", &nest_size))
>> +		goto err;
>> +
>> +	/* Sanity check */
>> +	if ((nest_size/PAGE_SIZE) > IMC_NEST_MAX_PAGES)
>> +		goto err;
>> +
>> +	/* Find the "HOMER region" for each chip */
>> +	rm_node = of_find_node_by_path("/reserved-memory");
>> +	if (!rm_node)
>> +		goto err;
>> +
>> +	/*
>> +	 * We need to look for the "ibm,homer-image" node in the
>> +	 * "/reserved-memory" node.
>> +	 */
>> +	for (dn = of_find_node_by_name(rm_node, "ibm,homer-image"); dn;
>> +			dn = of_find_node_by_name(dn, "ibm,homer-image")) {
>> +
>> +		/* Get the chip id to which the above homer region belongs to */
>> +		if (of_property_read_u32(dn, "ibm,chip-id", &chip_id))
>> +			goto err;
> So, I was thinking on this (and should probably comment on the firmware
> side as well).
>
> I'd prefer an OPAL interface where instead of looking up where
> ibm,homer-image is, we provide the kernel with a base address and then
> have offsets into it.
>
> That way, we don't tie the kernel code to counters that are only in the
> HOMER region.

Yes. This make sense. Adding something like this to IMC node
will be fine?

     chip@<id> {
         base_addr = < addr >;
         ibm,chip-id = < id>;
     };

Maddy

>
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-10 12:09   ` Thomas Gleixner
  2017-05-10 23:40     ` Stephen Rothwell
@ 2017-05-15 10:12     ` Madhavan Srinivasan
  2017-05-15 11:06       ` Thomas Gleixner
  1 sibling, 1 reply; 26+ messages in thread
From: Madhavan Srinivasan @ 2017-05-15 10:12 UTC (permalink / raw)
  To: Thomas Gleixner, Anju T Sudhakar
  Cc: mpe, LKML, linuxppc-dev, ego, bsingharora, Anton Blanchard,
	sukadev, mikey, stewart, dja, eranian, hemant, Peter Zijlstra

Sorry for delayed response.


On Wednesday 10 May 2017 05:39 PM, Thomas Gleixner wrote:
> On Thu, 4 May 2017, Anju T Sudhakar wrote:
>> +/*
>> + * nest_init : Initializes the nest imc engine for the current chip.
>> + * by default the nest engine is disabled.
>> + */
>> +static void nest_init(int *cpu_opal_rc)
>> +{
>> +	int rc;
>> +
>> +	/*
>> +	 * OPAL figures out which CPU to start based on the CPU that is
>> +	 * currently running when we call into OPAL
> I have no idea what that comment tries to tell me and how it is related to
> the init function or the invoked opal_imc_counters_stop() function.

Yep will fix the comment.

>
>> +	 */
>> +	rc = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
>> +	if (rc)
>> +		cpu_opal_rc[smp_processor_id()] = 1;
>> +}
>> +
>> +static void nest_change_cpu_context(int old_cpu, int new_cpu)
>> +{
>> +	int i;
>> +
>> +	for (i = 0;
>> +	     (per_nest_pmu_arr[i] != NULL) && (i < IMC_MAX_PMUS); i++)
>> +		perf_pmu_migrate_context(&per_nest_pmu_arr[i]->pmu,
>> +							old_cpu, new_cpu);
> Bah, this is horrible to read.
>
> 	struct imc_pmu **pn = per_nest_pmu_arr;
> 	int i;
>
> 	for (i = 0; *pn && i < IMC_MAX_PMUS; i++, pn++)
> 		perf_pmu_migrate_context(&(*pn)->pmu, old_cpu, new_cpu);
>
> Hmm?

Yes this is better. Will update the code.

>
>> +}
>> +
>> +static int ppc_nest_imc_cpu_online(unsigned int cpu)
>> +{
>> +	int nid;
>> +	const struct cpumask *l_cpumask;
>> +	struct cpumask tmp_mask;
> You should not allocate cpumask on stack unconditionally. Either make that
> cpumask_var_t and use zalloc/free_cpumask_var() or simply make it
>
> 	static struct cpumask tmp_mask;
>
> That's fine, because this is serialized by the hotplug code already.

ok will fix it as suggested.

>
>> +
>> +	/* Find the cpumask of this node */
>> +	nid = cpu_to_node(cpu);
>> +	l_cpumask = cpumask_of_node(nid);
>> +
>> +	/*
>> +	 * If any of the cpu from this node is already present in the mask,
>> +	 * just return, if not, then set this cpu in the mask.
>> +	 */
>> +	if (!cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask)) {
>> +		cpumask_set_cpu(cpu, &nest_imc_cpumask);
>> +		nest_change_cpu_context(-1, cpu);
>> +		return 0;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int ppc_nest_imc_cpu_offline(unsigned int cpu)
>> +{
>> +	int nid, target = -1;
>> +	const struct cpumask *l_cpumask;
>> +
>> +	/*
>> +	 * Check in the designated list for this cpu. Dont bother
>> +	 * if not one of them.
>> +	 */
>> +	if (!cpumask_test_and_clear_cpu(cpu, &nest_imc_cpumask))
>> +		return 0;
>> +
>> +	/*
>> +	 * Now that this cpu is one of the designated,
>> +	 * find a next cpu a) which is online and b) in same chip.
>> +	 */
>> +	nid = cpu_to_node(cpu);
>> +	l_cpumask = cpumask_of_node(nid);
>> +	target = cpumask_next(cpu, l_cpumask);
>> +
>> +	/*
>> +	 * Update the cpumask with the target cpu and
>> +	 * migrate the context if needed
>> +	 */
>> +	if (target >= 0 && target <= nr_cpu_ids) {
>> +		cpumask_set_cpu(target, &nest_imc_cpumask);
>> +		nest_change_cpu_context(cpu, target);
>> +	}
> What disables the perf context if this was the last CPU on the node?

My bad. i did not understand this. Is this regarding the updates
of the "flags" in the perf_event and hw_perf_event structs?


>
>> +	return 0;
>> +}
>> +
>> +static int nest_pmu_cpumask_init(void)
>> +{
>> +	const struct cpumask *l_cpumask;
>> +	int cpu, nid;
>> +	int *cpus_opal_rc;
>> +
>> +	if (!cpumask_empty(&nest_imc_cpumask))
>> +		return 0;
> What's that for? Paranoia engineering?

No.  The idea here is to generate the cpu_mask attribute
field only for the first "nest" pmu and use the same
for other "nest" units.

>> +
>> +	/*
>> +	 * Memory for OPAL call return value.
>> +	 */
>> +	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
>> +	if (!cpus_opal_rc)
>> +		goto fail;
>> +
>> +	/*
>> +	 * Nest PMUs are per-chip counters. So designate a cpu
>> +	 * from each chip for counter collection.
>> +	 */
>> +	for_each_online_node(nid) {
>> +		l_cpumask = cpumask_of_node(nid);
>> +
>> +		/* designate first online cpu in this node */
>> +		cpu = cpumask_first(l_cpumask);
>> +		cpumask_set_cpu(cpu, &nest_imc_cpumask);
>> +	}
> This is all unprotected against CPU hotplug.
>
>> +
>> +	/* Initialize Nest PMUs in each node using designated cpus */
>> +	on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
>> +						(void *)cpus_opal_rc, 1);
> What does this check on nodes which are not yet online and become online
> later?

Idea here is, not to have the nest engines running always. Enable/start
them only when needed. So at init stage of the pmu setup, disable them.
That said, opal api call to disable is needed for a new node
that comes online at later stage.  Nice catch, Thanks

>
>> +	/* Check return value array for any OPAL call failure */
>> +	for_each_cpu(cpu, &nest_imc_cpumask) {
> Races against CPU hotplug as well.
>
>> +		if (cpus_opal_rc[cpu])
>> +			goto fail;
>> +	}
> Leaks cpus_opal_rc.
>
>> +
>> +	cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
>> +			  "POWER_NEST_IMC_ONLINE",
> Please use a proper prefixed text here.
>
>> +			  ppc_nest_imc_cpu_online,
>> +			  ppc_nest_imc_cpu_offline);
>> +
>> +	return 0;
>> +
>> +fail:
>> +	if (cpus_opal_rc)
>> +		kfree(cpus_opal_rc);
>> +	return -ENODEV;
> But this whole function is completely overengineered. If you make that
> nest_init() part of the online function then this works also for nodes
> which come online later and it simplifies to:
>
> static int ppc_nest_imc_cpu_online(unsigned int cpu)
> {
> 	const struct cpumask *l_cpumask;
> 	static struct cpumask tmp_mask;
> 	int res;
>
> 	/* Get the cpumask of this node */
> 	l_cpumask = cpumask_of_node(cpu_to_node(cpu));
>
> 	/*
> 	 * If this is not the first online CPU on this node, then IMC is
> 	 * initialized already.
> 	 */
> 	if (cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask))
> 		return 0;
>
> 	/*
> 	 * If this fails, IMC is not usable.
> 	 *
> 	 * FIXME: Add a understandable comment what this actually does
> 	 * and why it can fail.
> 	 */
> 	res = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
> 	if (res)
> 		return res;
>
> 	/* Make this CPU the designated target for counter collection */
> 	cpumask_set_cpu(cpu, &nest_imc_cpumask);
> 	nest_change_cpu_context(-1, cpu);
> 	return 0;
> }
>
> static int nest_pmu_cpumask_init(void)
> {
> 	return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
> 				 "perf/powerpc/imc:online",
> 				 ppc_nest_imc_cpu_online,
> 				 ppc_nest_imc_cpu_offline);
> }
>
> Hmm?

Yes this make sense. But we need to first designate a cpu in each
chip at init setup and use opal api to disable the engine in the same.
So probably, after cpuhp_setup_state, can we do that?

>
>> +static void nest_imc_start(int *cpu_opal_rc)
>> +{
>> +	int rc;
>> +
>> +	/* Enable nest engine */
>> +	rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_NEST);
>> +	if (rc)
>> +		cpu_opal_rc[smp_processor_id()] = 1;
>> +
>> +}
>> +
>> +static int nest_imc_control(int operation)
>> +{
>> +	int *cpus_opal_rc, cpu;
>> +
>> +	/*
>> +	 * Memory for OPAL call return value.
>> +	 */
>> +	cpus_opal_rc = kzalloc((sizeof(int) * nr_cpu_ids), GFP_KERNEL);
>> +	if (!cpus_opal_rc)
>> +		return -ENOMEM;
> This function leaks cpus_opal_rc on each invocation. Great stuff!

Yep. Have fixed it and daniel also pointed out the
same in this review comments.

>
>> +	switch (operation) {
>> +
>> +	case	IMC_COUNTER_ENABLE:
>> +			/* Initialize Nest PMUs in each node using designated cpus */
>> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_imc_start,
> How is that supposed to work? This is called from interrupt disabled
> context. on_each_cpu_mask() must not be called from there. And in the worst
> case this is called not only from interrupt disabled context but from a smp
> function call .....

Indent here is to make sure nest engines running/enabled only
when needed and disabled rest of the time. So we added a
reserved/release logic to enable/diable the nest engine in the recent
version of the patchset. And my bad, could have done better
incase of testing this logic.

That said, we did hit "WARN_ON_ONCE" in smp_call_* with this version
of the patchset which you were spot on. So have moved the
reserved/release logic to *event_init(). Something like this, to the
latest internal version of the patchset.

@@ -74,7 +305,20 @@ static int nest_imc_event_init(struct perf_event *event)
  	 */
  	event->hw.event_base = pcni->vbase[config/PAGE_SIZE] +
  							(config & ~PAGE_MASK);
-
+	/*
+	 * Nest pmu units are enabled only when it is used.
+	 * See if this is triggered for the first time.
+	 * If yes, take the mutex lock and enable the nest counters.
+	 * If not, just increment the count in nest_events.
+	 */
+	if (atomic_inc_return(&nest_events) == 1) {
+		mutex_lock(&imc_nest_reserve);
+		rc = nest_imc_control(IMC_COUNTER_ENABLE);
+		mutex_unlock(&imc_nest_reserve);
+		if (rc) {
+			pr_err("IMC: Unable to start the counters\n");
+			return -EBUSY;
+		}
+	}
+	event->destroy = nest_imc_counters_release;
  	return 0;
  }


> Aside of that. What's the point of these type casts? If your function does
> not have the signature of a smp function call function, then you should fix
> that. If it has, then the type cast is just crap.

yes. Will fix the casting part.

>
>> +						(void *)cpus_opal_rc, 1);
> Ditto for this. Casting to (void *) is pointless.
>
>> +			break;
>> +	case	IMC_COUNTER_DISABLE:
>> +			/* Disable the counters */
>> +			on_each_cpu_mask(&nest_imc_cpumask, (smp_call_func_t)nest_init,
>> +						(void *)cpus_opal_rc, 1);
>> +			break;
>> +	default: return -EINVAL;
>> +
>> +	}
>> +
>> +	/* Check return value array for any OPAL call failure */
>> +	for_each_cpu(cpu, &nest_imc_cpumask) {
>> +		if (cpus_opal_rc[cpu])
>> +			return -ENODEV;
> So this just checks whether the disable/enable was successful, but what's
> the consequence? Counters stay half enabled or disabled depending on which
> of the CPUs failed.
>
> This whole result array dance is just useless as you are solely printing
> stuff at the call site. So I assume those calls are not supposed to fail,
> so you can do the print in the function itself and get rid of all this
> cpus_opal_rc hackery. Though that's the least of your worries, see above.

Yes. As I mentioned before, have moved the reserve/release
logic to the *event_init(). And apart from printing debug message
on the opal call failure, also return with -EBUSY in the *event_init()
incase of opal call failure.

>> +	}
>> +	return 0;
>> +}
>> +
>>   static void imc_event_start(struct perf_event *event, int flags)
>>   {
>>   	/*
>> @@ -129,19 +333,44 @@ static void imc_event_stop(struct perf_event *event, int flags)
>>   	imc_perf_event_update(event);
>>   }
>>   
>> -/*
>> - * The wrapper function is provided here, since we will have reserve
>> - * and release lock for imc_event_start() in the following patch.
>> - * Same in case of imc_event_stop().
>> - */
>>   static void nest_imc_event_start(struct perf_event *event, int flags)
>>   {
>> +	int rc;
>> +
>> +	/*
>> +	 * Nest pmu units are enabled only when it is used.
>> +	 * See if this is triggered for the first time.
>> +	 * If yes, take the mutex lock and enable the nest counters.
>> +	 * If not, just increment the count in nest_events.
>> +	 */
>> +	if (atomic_inc_return(&nest_events) == 1) {
>> +		mutex_lock(&imc_nest_reserve);
> How is that supposed to work? pmu->start() and pmu->stop() are called with
> interrupts disabled. Locking a mutex there is a NONO.
>
>> +		rc = nest_imc_control(IMC_COUNTER_ENABLE);
>> +		mutex_unlock(&imc_nest_reserve);
>> +		if (rc)
>> +			pr_err("IMC: Unbale to start the counters\n");
>> +	}
>>   	imc_event_start(event, flags);
>>   }
>>   
>>   static void nest_imc_event_stop(struct perf_event *event, int flags)
>>   {
>> +	int rc;
>> +
>>   	imc_event_stop(event, flags);
>> +	/*
>> +	 * See if we need to disable the nest PMU.
>> +	 * If no events are currently in use, then we have to take a
>> +	 * mutex to ensure that we don't race with another task doing
>> +	 * enable or disable the nest counters.
>> +	 */
>> +	if (atomic_dec_return(&nest_events) == 0) {
>> +		mutex_lock(&imc_nest_reserve);
> See above.
>
>> +		rc = nest_imc_control(IMC_COUNTER_DISABLE);
>> +		mutex_unlock(&imc_nest_reserve);
>> +		if (rc)
>> +			pr_err("IMC: Disable counters failed\n");
>> +	}
>>   }
> I have no idea how that survived any form of testing ....

yes could have done better incase of testing the reserve/release
logic it but that said, have identified and fixed the issues in the
latest internal version of the patchset with more test runs and will
post it out soon.

Thanks for your review comments.

Maddy
>
> Thanks,
>
> 	tglx
>

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support
  2017-05-15 10:12     ` Madhavan Srinivasan
@ 2017-05-15 11:06       ` Thomas Gleixner
  0 siblings, 0 replies; 26+ messages in thread
From: Thomas Gleixner @ 2017-05-15 11:06 UTC (permalink / raw)
  To: Madhavan Srinivasan
  Cc: Anju T Sudhakar, mpe, LKML, linuxppc-dev, ego, bsingharora,
	Anton Blanchard, sukadev, mikey, stewart, dja, eranian, hemant,
	Peter Zijlstra

On Mon, 15 May 2017, Madhavan Srinivasan wrote:
> On Wednesday 10 May 2017 05:39 PM, Thomas Gleixner wrote:
> > On Thu, 4 May 2017, Anju T Sudhakar wrote:
> > > +	/*
> > > +	 * Update the cpumask with the target cpu and
> > > +	 * migrate the context if needed
> > > +	 */
> > > +	if (target >= 0 && target <= nr_cpu_ids) {
> > > +		cpumask_set_cpu(target, &nest_imc_cpumask);
> > > +		nest_change_cpu_context(cpu, target);
> > > +	}
> > What disables the perf context if this was the last CPU on the node?
> 
> My bad. i did not understand this. Is this regarding the updates
> of the "flags" in the perf_event and hw_perf_event structs?

Sorry, there is nothing to understand. It was me misreading it.

> > > +static int nest_pmu_cpumask_init(void)
> > > +{
> > > +	const struct cpumask *l_cpumask;
> > > +	int cpu, nid;
> > > +	int *cpus_opal_rc;
> > > +
> > > +	if (!cpumask_empty(&nest_imc_cpumask))
> > > +		return 0;
> > What's that for? Paranoia engineering?
> 
> No.  The idea here is to generate the cpu_mask attribute
> field only for the first "nest" pmu and use the same
> for other "nest" units.

Why is nest_pmu_cpumask_init() called more than once? If it is then you
should have a proper flag marking it initialiazed rather than making a
completely obscure check for a cpu mask. That check could be empty for the
second invocation when the first invocation fails.

> > But this whole function is completely overengineered. If you make that
> > nest_init() part of the online function then this works also for nodes
> > which come online later and it simplifies to:
> > 
> > static int ppc_nest_imc_cpu_online(unsigned int cpu)
> > {
> > 	const struct cpumask *l_cpumask;
> > 	static struct cpumask tmp_mask;
> > 	int res;
> > 
> > 	/* Get the cpumask of this node */
> > 	l_cpumask = cpumask_of_node(cpu_to_node(cpu));
> > 
> > 	/*
> > 	 * If this is not the first online CPU on this node, then IMC is
> > 	 * initialized already.
> > 	 */
> > 	if (cpumask_and(&tmp_mask, l_cpumask, &nest_imc_cpumask))
> > 		return 0;
> > 
> > 	/*
> > 	 * If this fails, IMC is not usable.
> > 	 *
> > 	 * FIXME: Add a understandable comment what this actually does
> > 	 * and why it can fail.
> > 	 */
> > 	res = opal_imc_counters_stop(OPAL_IMC_COUNTERS_NEST);
> > 	if (res)
> > 		return res;
> > 
> > 	/* Make this CPU the designated target for counter collection */
> > 	cpumask_set_cpu(cpu, &nest_imc_cpumask);
> > 	nest_change_cpu_context(-1, cpu);
> > 	return 0;
> > }
> > 
> > static int nest_pmu_cpumask_init(void)
> > {
> > 	return cpuhp_setup_state(CPUHP_AP_PERF_POWERPC_NEST_ONLINE,
> > 				 "perf/powerpc/imc:online",
> > 				 ppc_nest_imc_cpu_online,
> > 				 ppc_nest_imc_cpu_offline);
> > }
> > 
> > Hmm?
> 
> Yes this make sense. But we need to first designate a cpu in each
> chip at init setup and use opal api to disable the engine in the same.
> So probably, after cpuhp_setup_state, can we do that?

Errm. That's what ppc_nest_imc_cpu_online() does.

It checks whether this is the first online cpu on a node and if yes, it
calls opal_imc_counters_stop() and sets that cpu in nest_imc_cpumask.

cpuhp_setup_state() invokes the callback on each online CPU. Seo evrything
is set up proper after that.

Thanks,

	tglx

^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: [PATCH v8 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging
  2017-05-04 14:19 ` [PATCH v8 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging Anju T Sudhakar
@ 2017-05-17  7:57   ` Stewart Smith
  0 siblings, 0 replies; 26+ messages in thread
From: Stewart Smith @ 2017-05-17  7:57 UTC (permalink / raw)
  To: Anju T Sudhakar, mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, dja, eranian, hemant, maddy, anju

Anju T Sudhakar <anju@linux.vnet.ibm.com> writes:
> --- a/arch/powerpc/include/asm/imc-pmu.h
> +++ b/arch/powerpc/include/asm/imc-pmu.h
> @@ -24,6 +24,7 @@
>   */
>  #define IMC_MAX_CHIPS			32
>  #define IMC_MAX_PMUS			32
> +#define IMC_MAX_CORES			32
>
>  /*
>   * This macro is used for memory buffer allocation of
> @@ -38,6 +39,11 @@
>  #define IMC_NEST_MAX_PAGES		64
>
>  /*
> + * IMC Core engine expects 8K bytes of memory for counter collection.
> + */
> +#define IMC_CORE_COUNTER_MEM		8192

Any reason for this not to be in the device tree?

This is the size of memory that Linux needs to give OPAL, so the size of
that should probably come from OPAL rather than Linux.

Otherwise, if in the future, we had a counter with an offset greater
than 8192 in, we'd have no way to detect that and we'd fail silently by
overwriting memory.

-- 
Stewart Smith
OPAL Architect, IBM.

^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2017-05-17  7:57 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-05-04 14:19 [PATCH v8 00/10] IMC Instrumentation Support Anju T Sudhakar
2017-05-04 14:19 ` [PATCH v8 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
2017-05-04 14:19 ` [PATCH v8 02/10] powerpc/powernv: Autoload IMC device driver module Anju T Sudhakar
2017-05-11  7:49   ` Stewart Smith
2017-05-12  4:36     ` Madhavan Srinivasan
2017-05-04 14:19 ` [PATCH v8 03/10] powerpc/powernv: Detect supported IMC units and its events Anju T Sudhakar
2017-05-04 14:19 ` [PATCH v8 04/10] powerpc/perf: Add generic IMC pmu groupand event functions Anju T Sudhakar
2017-05-04 14:19 ` [PATCH v8 05/10] powerpc/perf: IMC pmu cpumask and cpuhotplug support Anju T Sudhakar
2017-05-08 14:12   ` Daniel Axtens
2017-05-09  6:10     ` Madhavan Srinivasan
2017-05-12  2:18       ` Stewart Smith
2017-05-12  3:33         ` Michael Ellerman
2017-05-12  3:50           ` Madhavan Srinivasan
2017-05-12  3:41         ` Madhavan Srinivasan
2017-05-09 10:54     ` Anju T Sudhakar
2017-05-10 12:09   ` Thomas Gleixner
2017-05-10 23:40     ` Stephen Rothwell
2017-05-11  8:39       ` Thomas Gleixner
2017-05-15 10:12     ` Madhavan Srinivasan
2017-05-15 11:06       ` Thomas Gleixner
2017-05-04 14:19 ` [PATCH v8 06/10] powerpc/powernv: Core IMC events detection Anju T Sudhakar
2017-05-04 14:19 ` [PATCH v8 07/10] powerpc/perf: PMU functions for Core IMC and hotplugging Anju T Sudhakar
2017-05-17  7:57   ` Stewart Smith
2017-05-04 14:19 ` [PATCH v8 08/10] powerpc/powernv: Thread IMC events detection Anju T Sudhakar
2017-05-04 14:19 ` [PATCH v8 09/10] powerpc/perf: Thread IMC PMU functions Anju T Sudhakar
2017-05-04 14:19 ` [PATCH v8 10/10] powerpc/perf: Thread imc cpuhotplug support Anju T Sudhakar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).