linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v12 00/10] IMC Instrumentation Support
@ 2017-07-03  9:37 Anju T Sudhakar
  2017-07-03  9:37 ` [PATCH v12 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
                   ` (6 more replies)
  0 siblings, 7 replies; 11+ messages in thread
From: Anju T Sudhakar @ 2017-07-03  9:37 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Power9 has In-Memory-Collection (IMC) infrastructure which contains             
various Performance Monitoring Units (PMUs) at Nest level (these are            
on-chip but off-core), Core level and Thread level.                             
                                                                                
The Nest PMU counters are handled by a Nest IMC microcode which runs            
in the OCC (On-Chip Controller) complex. The microcode collects the             
counter data and moves the nest IMC counter data to memory.                     
                                                                                
The Core and Thread IMC PMU counters are handled in the core. Core              
level PMU counters give us the IMC counters' data per core and thread           
level PMU counters give us the IMC counters' data per CPU thread.               
                                                                                
This patchset enables the nest IMC, core IMC and thread IMC                     
PMUs and is based on the initial work done by Madhavan Srinivasan.              
"Nest Instrumentation Support" :                                                
https://lists.ozlabs.org/pipermail/linuxppc-dev/2015-August/132078.html         
                                                                                
v1 for this patchset can be found here :                                        
https://lwn.net/Articles/705475/                                                
                                                                                
Nest events:                                                                    
Per-chip nest instrumentation provides various per-chip metrics                 
such as memory, powerbus, Xlink and Alink bandwidth.                            
                                                                                
Core events:                                                                    
Per-core IMC instrumentation provides various per-core metrics                  
such as non-idle cycles, non-idle instructions, various cache and               
memory related metrics etc.                                                     
                                                                                
Thread events:                                                                  
All the events for thread level are same as core level with the                 
difference being in the domain. These are per-cpu metrics.                      
                                                                                
PMU Events' Information:                                                        
OPAL obtains the IMC PMU and event information from the IMC Catalog             
and passes on to the kernel via the device tree. The events' information        
contains :                                                                      
 - Event name                                                                   
 - Event Offset                                                                 
 - Event description                                                            
and, maybe :                                                                    
 - Event scale                                                                  
 - Event unit                                                                   
                                                                                
Some PMUs may have a common scale and unit values for all their                 
supported events. For those cases, the scale and unit properties for            
those events must be inherited from the PMU.                                    
                                                                                
The event offset in the memory is where the counter data gets                   
accumulated.                                                                    
                                                                                
The OPAL-side patches are upstream :                                            
https://lists.ozlabs.org/pipermail/skiboot/2017-June/007885.html    

The kernel discovers the IMC counters information in the device tree            
at the "imc-counters" device node which has a compatible field                  
"ibm,opal-in-memory-counters".                                                  
                                                                                
Parsing of the Events' information:                                             
To parse the IMC PMUs and events information, the kernel has to                 
discover the "imc-counters" node and walk through the pmu and event             
nodes.                                                                          
                                                                                
Here is an excerpt of the dt showing the imc-counters with                      
mcs (nest), core and thread node:                                               
                                                                                
/dts-v1/;                                                                       
                                                                                
/ {                                                                             
        name = "";                                                              
        compatible = "ibm,opal-in-memory-counters";                             
        #address-cells = <0x1>;                                                 
        #size-cells = <0x1>;                                                    
        version-id = "";                                                        
                                                                                
        NEST_MCS: nest-mcs-events {                                             
                #address-cells = <0x1>;                                         
                #size-cells = <0x1>;                                            
                                                                                
                event at 0 {                                                    
                        event-name = "RRTO_QFULL_NO_DISP" ;                     
                        reg = <0x0 0x8>;                                        
                        desc = "RRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid RRTO op is not dispatched due to a command list full condition" ;
                };                                                              
                event at 8 {                                                    
                        event-name = "WRTO_QFULL_NO_DISP" ;                     
                        reg = <0x8 0x8>;                                        
                        desc = "WRTO not dispatched in MCS0 due to capacity - pulses once for each time a valid WRTO op is not dispatched due to a command list full condition" ;
                };                                                              
                [...]                                                           
        mcs01 {                                                                 
                compatible = "ibm,imc-counters";                                
                events-prefix = "PM_MCS01_";                                    
                unit = "";                                                      
                scale = "";                                                     
                reg = <0x118 0x8>;                                              
                events = < &NEST_MCS >;                                         
                type = <0x10>;                                                  
        };                                                                      
        mcs23 {                                                                 
                compatible = "ibm,imc-counters";                                
                events-prefix = "PM_MCS23_";                                    
                unit = "";                                                      
                scale = "";                                                     
                reg = <0x198 0x8>;                                              
                events = < &NEST_MCS >;                                         
                type = <0x10>;                                                  
        };                                                                      
        [...]                                                                   
                                                                                
        CORE_EVENTS: core-events {                                              
                #address-cells = <0x1>;                                         
                #size-cells = <0x1>;                                            
                                                                                
                event at e0 {                                                   
                        event-name = "0THRD_NON_IDLE_PCYC" ;                    
                        reg = <0xe0 0x8>;                                       
                        desc = "The number of processor cycles when all threads are idle" ;
                };                                                              
                event at 120 {                                                  
                        event-name = "1THRD_NON_IDLE_PCYC" ;                    
                        reg = <0x120 0x8>;                                      
                        desc = "The number of processor cycles when exactly one SMT thread is executing non-idle code" ;
                };                                                              
                [...]                                                           
        core {                                                                  
                compatible = "ibm,imc-counters";                                
                events-prefix = "CPM_";                                         
                unit = "";                                                      
                scale = "";                                                     
                reg = <0x0 0x8>;                                                
                events = < &CORE_EVENTS >;                                      
                type = <0x4>;                                                   
        };                                                                      
                                                                                
        thread {                                                                
                compatible = "ibm,imc-counters";                                
                events-prefix = "CPM_";                                         
                unit = "";                                                      
                scale = "";                                                     
                reg = <0x0 0x8>;                                                
                events = < &CORE_EVENTS >;                                      
                type = <0x1>;                                                   
        };                                                                      
};                                                                              
                                                                                
>From the device tree, the kernel parses the PMUs and their events'              
information.                                                                    
                                                                                
After parsing the IMC PMUs and their events, the PMUs and their                 
attributes are registered in the kernel.                                        
                                                                                
This patchset (patches 9 and 10) configure the thread level IMC PMUs            
to count for tasks, which give us the thread level metric values per            
task.                                                                           
Example Usage :                                                                 
 # perf list                                                                    
                                                                                
  [...]                                                                         
  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT01/            [Kernel PMU event]         
  nest_mcs01/PM_MCS01_64B_RD_DISP_PORT23/            [Kernel PMU event]         
                                                                                
  [...]                                                                         
  core_imc/CPM_0THRD_NON_IDLE_PCYC/                  [Kernel PMU event]         
  core_imc/CPM_1THRD_NON_IDLE_INST/                  [Kernel PMU event]         
  [...]                                                                         
  thread_imc/CPM_0THRD_NON_IDLE_PCYC/                [Kernel PMU event]         
  thread_imc/CPM_1THRD_NON_IDLE_INST/                [Kernel PMU event]         
                                                                                
To see per chip data for nest_mcs0/PM_MCS_DOWN_128B_DATA_XFER_MC0/ :            
 # ./perf stat -e "nest_mcs01/PM_MCS01_64B_WR_DISP_PORT01/" -a --per-socket     
                                                                                
To see non-idle instructions for core 0 :                                       
 # ./perf stat -e "core_imc/CPM_NON_IDLE_INST/" -C 0 -I 1000                    
                                                                                
To see non-idle instructions for a "make" :                                     
 # ./perf stat -e "thread_imc/CPM_NON_IDLE_PCYC/" make                          
                                                                                
Comments/feedback/suggestions are welcome.                                      
                                                                                
                                                                                
TODO:                                                                           
1)Add a sysfs interface to disable the Core imc (both for ldbar and pdbar)      
                                                                                
                                                                                
Changelog:                                                                      
                                                                                
v11 -> v12
 - cleanup_all_core_imc_memory() function updated.
 - is_core_imc_mem_inited function is made static.
 - code rearrangement is done
 - event_init functions for nest, core and thread are updated
   with a new logic to obtain the lock.
 - Updated the comments.

v10 -> v11                                                                      
                                                                                
 - cpuhotplug call unregistration for nest counters is handled.                 
 - nest counters are also disable in case of kdump.                             
 - alloc_pages_node is used for memory allocation for core and thread,          
   instead of alloc_pages_exact_nid.                                            
 - base_addr calculations for nest, core, thread events are modified, as the    
   'config' now has more fields .                                               
 - event config fields are updated for nest,core and thread.                    
 - cpuhotplug function for nest,core and thread are modified.                   
 - opal-call api for start and stop is changed.                                 
                                                                                
v9 -> v10                                                                       
 - reworked the cpu hot plug functions for nest and core                        
 - Updated imc_get_mem_addr_nest                                                
 - Changed u64 vbase[IMC_MAX_PAGES]; to u64 *vbase[IMC_MAX_PAGES]; in struct imc_mem_info
                                                                                
v8 -> v9                                                                        
 - Updated nest, core, thread cpuhotplug functions.                             
 - PMU node parsing logic is changed as there is change in                      
   the ima-catalog file. PMU nodes are identified based on the                  
   "type" property.                                                             
 - Since imc-counters subtree accomodates the memory base                       
   address and offset for nest counter data, logic to get                       
   memory address for nest counters data is updated.                            
 - Memory allocation functions for core and thread are updated.                 
 - Data structures for imc instrumentation are updated.                         
 - pmu reserve/release functions for nest,core,thread are                       
   moved to *_imc_event_init.                                                   
 - Updated the comments.                                                        
 - Included necessary checks in core_imc_change_cpu_context()                   
                                                                                
v7 -> v8:                                                                       
 - opal-call API for nest and core is changed.                                  
   OPAL_NEST_IMC_COUNTERS_CONTROL and                                           
   OPAL_CORE_IMC_COUNTERS_CONTROL  is replaced with                             
   OPAL_IMC_COUNTERS_INIT, OPAL_IMC_COUNTERS_START and                          
   OPAL_IMC_COUNTERS_STOP.                                                      
 - thread_ima doesn't have CPUMASK_ATTR, hence added a                          
   fix in patch 09/10, which will swap the IMC_EVENT_ATTR                       
   slot with IMC_CPUMASK_ATTR.                                                  

v6 -> v7:                                                                       
 - Updated the commit message and code comments.                                
 - Changed the counter init code to disable the                                 
   nest/core counters by default and enable only                                
   when it is used.                                                             
 - Updated the pmu-setup code to register the                                   
   PMUs which doesn't have events.                                              
 - replaced imc_event_info_val() to imc_event_prop_update()                     
 - Updated the imc_pmu_setup() code, by checking for the "value"                
   of compatible property instead of merely checking for compatible.            
 - removed imc_get_domain().                                                    
 - init_imc_pmu() and imc_pmu_setup() are made  __init.                         
 - update_max_val() is invoked immediately after updating the offset value.     

v5 -> v6:                                                                       
 - merged few patches for the readability and code flow                         
 - Updated the commit message and code comments.                                
 - updated cpuhotplug code and added checks for perf migration context          
 - Added READ_ONCE() when reading the counter data.                             
 - replaced of_property_read_u32() with of_get_address() for "reg" property read
 - replaced UNKNOWN_DOMAIN with IMC_DOMAIN_UNKNOWN                              

 v4 -> v5:                                                                      
 - Updated opal call numbers                                                    
 - Added a patch to disable Core-IMC device using shutdown callback             
 - Added patch to support cpuhotplug for thread-imc                             
 - Added patch to disable and enable core imc engine in cpuhot plug path        

 v3 -> v4 :                                                                     
 - Changed the events parser code to discover the PMU and events because        
   of the changed format of the IMC DTS file (Patch 3).                         
 - Implemented the two TODOs to include core and thread IMC support with        
   this patchset (Patches 7 through 10).                                        
 - Changed the CPU hotplug code of Nest IMC PMUs to include a new state         
   CPUHP_AP_PERF_POWERPC_NEST_ONLINE (Patch 6).                                 

 v2 -> v3 :                                                                     
 - Changed all references for IMA (In-Memory Accumulation) to IMC (In-Memory    
   Collection).                                                                 

v1 -> v2 :                                                                     
 - Account for the cases where a PMU can have a common scale and unit           
   values for all its supported events (Patch 3/6).                             
 - Fixed a Build error (for maple_defconfig) by enabling imc_pmu.o              
   only for CONFIG_PPC_POWERNV=y (Patch 4/6)                                    
 - Read from the "event-name" property instead of "name" for an event           
   node (Patch 3/6).                                                            
                                                                                
                                                                                
                                                                                
                                                                                
Anju T Sudhakar (6):                                                            
  powerpc/powernv: Autoload IMC device driver module                            
  powerpc/perf: Add generic IMC pmu group and event functions                   
  powerpc/perf: IMC pmu cpumask and cpuhotplug support                          
  powerpc/powernv: Thread IMC events detection                                  
  powerpc/perf: Thread IMC PMU functions                                        
  powerpc/perf: Thread imc cpuhotplug support                                   
                                                                                
Madhavan Srinivasan (4):                                                        
  powerpc/powernv: Data structure and macros definitions for IMC                
  powerpc/powernv: Detect supported IMC units and its events                    
  powerpc/powernv: Core IMC events detection                                    
  powerpc/perf: PMU functions for Core IMC and hotplugging                     


 arch/powerpc/include/asm/imc-pmu.h             |  129 +++
 arch/powerpc/include/asm/opal-api.h            |   11 +-
 arch/powerpc/include/asm/opal.h                |    4 +
 arch/powerpc/perf/Makefile                     |    3 +
 arch/powerpc/perf/imc-pmu.c                    | 1137 ++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/Kconfig         |   10 +
 arch/powerpc/platforms/powernv/Makefile        |    1 +
 arch/powerpc/platforms/powernv/opal-imc.c      |  569 ++++++++++++
 arch/powerpc/platforms/powernv/opal-wrappers.S |    3 +
 arch/powerpc/platforms/powernv/opal.c          |   18 +
 include/linux/cpuhotplug.h                     |    3 +
 11 files changed, 1887 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h
 create mode 100644 arch/powerpc/perf/imc-pmu.c
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

-- 
2.11.0

^ permalink raw reply	[flat|nested] 11+ messages in thread

* [PATCH v12 01/10] powerpc/powernv: Data structure and macros definitions for IMC
  2017-07-03  9:37 [PATCH v12 00/10] IMC Instrumentation Support Anju T Sudhakar
@ 2017-07-03  9:37 ` Anju T Sudhakar
  2017-07-07  9:26   ` Michael Ellerman
  2017-07-03  9:37 ` [PATCH v12 02/10] powerpc/powernv: Autoload IMC device driver module Anju T Sudhakar
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Anju T Sudhakar @ 2017-07-03  9:37 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

From: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

Create a new header file to add the data structures and
macros needed for In-Memory Collection (IMC) counter support.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h | 99 ++++++++++++++++++++++++++++++++++++++
 1 file changed, 99 insertions(+)
 create mode 100644 arch/powerpc/include/asm/imc-pmu.h

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
new file mode 100644
index 000000000000..ffaea0b9c13e
--- /dev/null
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -0,0 +1,99 @@
+#ifndef PPC_POWERNV_IMC_PMU_DEF_H
+#define PPC_POWERNV_IMC_PMU_DEF_H
+
+/*
+ * IMC Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *           (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *           (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <linux/of.h>
+#include <linux/io.h>
+#include <asm/opal.h>
+
+/*
+ * For static allocation of some of the structures.
+ */
+#define IMC_MAX_PMUS			32
+
+/*
+ * This macro is used for memory buffer allocation of
+ * event names and event string
+ */
+#define IMC_MAX_NAME_VAL_LEN		96
+
+/*
+ * Currently Microcode supports a max of 256KB of counter memory
+ * in the reserved memory region. Max pages to mmap (considering 4K PAGESIZE).
+ */
+#define IMC_MAX_PAGES			64
+
+/*
+ *Compatbility macros for IMC devices
+ */
+#define IMC_DTB_COMPAT			"ibm,opal-in-memory-counters"
+#define IMC_DTB_UNIT_COMPAT		"ibm,imc-counters"
+
+/*
+ * Structure to hold memory address information for imc units.
+ */
+struct imc_mem_info {
+	u32 id;
+	u64 *vbase[IMC_MAX_PAGES];
+};
+
+/*
+ * Place holder for nest pmu events and values.
+ */
+struct imc_events {
+	char *ev_name;
+	char *ev_value;
+};
+
+#define IMC_FORMAT_ATTR		0
+#define IMC_CPUMASK_ATTR	1
+#define IMC_EVENT_ATTR		2
+#define IMC_NULL_ATTR		3
+
+/*
+ * Device tree parser code detects IMC pmu support and
+ * registers new IMC pmus. This structure will hold the
+ * pmu functions, events, counter memory information
+ * and attrs for each imc pmu and will be referenced at
+ * the time of pmu registration.
+ */
+struct imc_pmu {
+	struct pmu pmu;
+	int domain;
+	/*
+	 * flag to notify whether the memory is mmaped
+	 * or allocated by kernel.
+	 */
+	int imc_counter_mmaped;
+	struct imc_mem_info *mem_info;
+	struct imc_events *events;
+	u32 counter_mem_size;
+	/*
+	 * Attribute groups for the PMU. Slot 0 used for
+	 * format attribute, slot 1 used for cpusmask attribute,
+	 * slot 2 used for event attribute. Slot 3 keep as
+	 * NULL.
+	 */
+	const struct attribute_group *attr_groups[4];
+};
+
+/*
+ * Domains for IMC PMUs
+ */
+#define IMC_DOMAIN_NEST		1
+
+#endif /* PPC_POWERNV_IMC_PMU_DEF_H */
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v12 02/10] powerpc/powernv: Autoload IMC device driver module
  2017-07-03  9:37 [PATCH v12 00/10] IMC Instrumentation Support Anju T Sudhakar
  2017-07-03  9:37 ` [PATCH v12 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
@ 2017-07-03  9:37 ` Anju T Sudhakar
  2017-07-07  6:53   ` Michael Ellerman
  2017-07-03  9:37 ` [PATCH v12 03/10] powerpc/powernv: Detect supported IMC units and its events Anju T Sudhakar
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Anju T Sudhakar @ 2017-07-03  9:37 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Code to create platform device for the IMC counters.
Paltform devices are created based on the IMC compatibility
string.

New Config flag "CONFIG_HV_PERF_IMC_CTRS" add to contain the
IMC counter changes.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/platforms/powernv/Kconfig    | 10 +++++
 arch/powerpc/platforms/powernv/Makefile   |  1 +
 arch/powerpc/platforms/powernv/opal-imc.c | 73 +++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal.c     | 18 ++++++++
 4 files changed, 102 insertions(+)
 create mode 100644 arch/powerpc/platforms/powernv/opal-imc.c

diff --git a/arch/powerpc/platforms/powernv/Kconfig b/arch/powerpc/platforms/powernv/Kconfig
index 6a6f4ef46b9e..543c6cd5e8d3 100644
--- a/arch/powerpc/platforms/powernv/Kconfig
+++ b/arch/powerpc/platforms/powernv/Kconfig
@@ -30,3 +30,13 @@ config OPAL_PRD
 	help
 	  This enables the opal-prd driver, a facility to run processor
 	  recovery diagnostics on OpenPower machines
+
+config HV_PERF_IMC_CTRS
+       bool "Hypervisor supplied In Memory Collection PMU events (Nest & Core)"
+       default y
+       depends on PERF_EVENTS && PPC_POWERNV
+       help
+	  Enable access to hypervisor supplied in-memory collection counters
+	  in perf. IMC counters are available from Power9 systems.
+
+          If unsure, select Y.
diff --git a/arch/powerpc/platforms/powernv/Makefile b/arch/powerpc/platforms/powernv/Makefile
index b5d98cb3f482..715e531f6711 100644
--- a/arch/powerpc/platforms/powernv/Makefile
+++ b/arch/powerpc/platforms/powernv/Makefile
@@ -12,3 +12,4 @@ obj-$(CONFIG_PPC_SCOM)	+= opal-xscom.o
 obj-$(CONFIG_MEMORY_FAILURE)	+= opal-memory-errors.o
 obj-$(CONFIG_TRACEPOINTS)	+= opal-tracepoints.o
 obj-$(CONFIG_OPAL_PRD)	+= opal-prd.o
+obj-$(CONFIG_HV_PERF_IMC_CTRS) += opal-imc.o
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
new file mode 100644
index 000000000000..5b1045c81af4
--- /dev/null
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -0,0 +1,73 @@
+/*
+ * OPAL IMC interface detection driver
+ * Supported on POWERNV platform
+ *
+ * Copyright	(C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *		(C) 2017 Anju T Sudhakar, IBM Corporation.
+ *		(C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ */
+#include <linux/kernel.h>
+#include <linux/module.h>
+#include <linux/platform_device.h>
+#include <linux/miscdevice.h>
+#include <linux/fs.h>
+#include <linux/of.h>
+#include <linux/of_address.h>
+#include <linux/of_platform.h>
+#include <linux/poll.h>
+#include <linux/mm.h>
+#include <linux/slab.h>
+#include <linux/crash_dump.h>
+#include <asm/opal.h>
+#include <asm/io.h>
+#include <asm/uaccess.h>
+#include <asm/cputable.h>
+#include <asm/imc-pmu.h>
+
+static int opal_imc_counters_probe(struct platform_device *pdev)
+{
+	struct device_node *imc_dev = NULL;
+
+	if (!pdev || !pdev->dev.of_node)
+		return -ENODEV;
+
+	/*
+	 * Check whether this is kdump kernel. If yes, just return.
+	 */
+	if (is_kdump_kernel())
+		return -ENODEV;
+
+	imc_dev = pdev->dev.of_node;
+	if (!imc_dev)
+		return -ENODEV;
+
+	return 0;
+}
+
+static const struct of_device_id opal_imc_match[] = {
+	{ .compatible = IMC_DTB_COMPAT },
+	{},
+};
+
+static struct platform_driver opal_imc_driver = {
+	.driver = {
+		.name = "opal-imc-counters",
+		.of_match_table = opal_imc_match,
+	},
+	.probe = opal_imc_counters_probe,
+};
+
+MODULE_DEVICE_TABLE(of, opal_imc_match);
+module_platform_driver(opal_imc_driver);
+MODULE_DESCRIPTION("PowerNV OPAL IMC driver");
+MODULE_LICENSE("GPL");
diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
index 59684b4af4d1..fbdca259ea76 100644
--- a/arch/powerpc/platforms/powernv/opal.c
+++ b/arch/powerpc/platforms/powernv/opal.c
@@ -14,6 +14,7 @@
 #include <linux/printk.h>
 #include <linux/types.h>
 #include <linux/of.h>
+#include <linux/of_address.h>
 #include <linux/of_fdt.h>
 #include <linux/of_platform.h>
 #include <linux/interrupt.h>
@@ -30,6 +31,7 @@
 #include <asm/opal.h>
 #include <asm/firmware.h>
 #include <asm/mce.h>
+#include <asm/imc-pmu.h>
 
 #include "powernv.h"
 
@@ -705,6 +707,17 @@ static void opal_pdev_init(const char *compatible)
 		of_platform_device_create(np, NULL, NULL);
 }
 
+#ifdef CONFIG_HV_PERF_IMC_CTRS
+static void __init opal_imc_init_dev(void)
+{
+	struct device_node *np;
+
+	np = of_find_compatible_node(NULL, NULL, IMC_DTB_COMPAT);
+	if (np)
+		of_platform_device_create(np, NULL, NULL);
+}
+#endif
+
 static int kopald(void *unused)
 {
 	unsigned long timeout = msecs_to_jiffies(opal_heartbeat) + 1;
@@ -778,6 +791,11 @@ static int __init opal_init(void)
 	/* Setup a heatbeat thread if requested by OPAL */
 	opal_init_heartbeat();
 
+#ifdef CONFIG_HV_PERF_IMC_CTRS
+	/* Detect IMC pmu counters support and create PMUs */
+	opal_imc_init_dev();
+#endif
+
 	/* Create leds platform devices */
 	leds = of_find_node_by_path("/ibm,opal/leds");
 	if (leds) {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v12 03/10] powerpc/powernv: Detect supported IMC units and its events
  2017-07-03  9:37 [PATCH v12 00/10] IMC Instrumentation Support Anju T Sudhakar
  2017-07-03  9:37 ` [PATCH v12 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
  2017-07-03  9:37 ` [PATCH v12 02/10] powerpc/powernv: Autoload IMC device driver module Anju T Sudhakar
@ 2017-07-03  9:37 ` Anju T Sudhakar
  2017-07-06 13:48   ` Michael Ellerman
  2017-07-03  9:37 ` [PATCH v12 04/10] powerpc/perf: Add generic IMC pmu group and event functions Anju T Sudhakar
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 11+ messages in thread
From: Anju T Sudhakar @ 2017-07-03  9:37 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

From: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

Parse device tree to detect IMC units. Traverse through each IMC unit
node to find supported events and corresponding unit/scale files (if any).

The device tree for IMC counters starts at the node "imc-counters".
This node contains all the IMC PMU nodes and event nodes
for these IMC PMUs. The PMU nodes have an "events" property which has a
phandle value for the actual events node. The events are separated from
the PMU nodes to abstract out the common events. For example, PMU node
"mcs0", "mcs1" etc. will contain a pointer to "nest-mcs-events" since,
the events are common between these PMUs. These events have a different
prefix based on their relation to different PMUs, and hence, the PMU
nodes themselves contain an "events-prefix" property. The value for this
property concatenated to the event name, forms the actual event
name. Also, the PMU have a "reg" field as the base offset for the events
which belong to this PMU. This "reg" field is added to event's "reg" field
in the "events" node, which gives us the location of the counter data. Kernel
code uses this offset as event configuration value.

Device tree parser code also looks for scale/unit property in the event
node and passes on the value as an event attr for perf interface to use
in the post processing by the perf tool. Some PMUs may have common scale
and unit properties which implies that all events supported by this PMU
inherit the scale and unit properties of the PMU itself. For those
events, we need to set the common unit and scale values.

For failure to initialize any unit or any event, disable that unit and
continue setting up the rest of them.

Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |   5 +
 arch/powerpc/platforms/powernv/opal-imc.c | 439 +++++++++++++++++++++++++++++-
 2 files changed, 443 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index ffaea0b9c13e..2a0239e2590d 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -91,6 +91,11 @@ struct imc_pmu {
 	const struct attribute_group *attr_groups[4];
 };
 
+/* In-Memory Collection Counters Type */
+enum {
+	IMC_COUNTER_PER_CHIP            = 0x10,
+};
+
 /*
  * Domains for IMC PMUs
  */
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 5b1045c81af4..839c25718110 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -34,9 +34,437 @@
 #include <asm/cputable.h>
 #include <asm/imc-pmu.h>
 
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+static int imc_event_prop_update(char *name, struct imc_events *events)
+{
+	char *buf;
+
+	if (!events || !name)
+		return -EINVAL;
+
+	/* memory for content */
+	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	events->ev_name = name;
+	events->ev_value = buf;
+	return 0;
+}
+
+static int imc_event_prop_str(struct property *pp, char *name,
+			      struct imc_events *events)
+{
+	int ret;
+
+	ret = imc_event_prop_update(name, events);
+	if (ret)
+		return ret;
+
+	if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
+	   (pp->length > IMC_MAX_NAME_VAL_LEN))
+		return -EINVAL;
+	strncpy(events->ev_value, (const char *)pp->value, pp->length);
+
+	return 0;
+}
+
+static int imc_event_prop_val(char *name, u32 val,
+			      struct imc_events *events)
+{
+	int ret;
+
+	ret = imc_event_prop_update(name, events);
+	if (ret)
+		return ret;
+	snprintf(events->ev_value, IMC_MAX_NAME_VAL_LEN, "event=0x%x", val);
+
+	return 0;
+}
+
+static int set_event_property(struct property *pp, char *event_prop,
+			      struct imc_events *events, char *ev_name)
+{
+	char *buf;
+	int ret;
+
+	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+	if (!buf)
+		return -ENOMEM;
+
+	sprintf(buf, "%s.%s", ev_name, event_prop);
+	ret = imc_event_prop_str(pp, buf, events);
+	if (ret) {
+		if (events->ev_name)
+			kfree(events->ev_name);
+		if (events->ev_value)
+			kfree(events->ev_value);
+	}
+	return ret;
+}
+
+/*
+ * imc_events_node_parser: Parse the event node "dev" and assign the parsed
+ *                         information to event "events".
+ *
+ * Parses the "reg", "scale" and "unit" properties of this event.
+ * "reg" gives us the event offset in the counter memory.
+ */
+static int imc_events_node_parser(struct device_node *dev,
+				  struct imc_events *events,
+				  struct property *event_scale,
+				  struct property *event_unit,
+				  struct property *name_prefix,
+				  u32 reg, int pmu_domain)
+{
+	struct property *name, *pp;
+	char *ev_name;
+	u32 val;
+	int idx = 0, ret;
+
+	if (!dev)
+		goto fail;
+
+	/* Check for "event-name" property, which is the perfix for event names */
+	name = of_find_property(dev, "event-name", NULL);
+	if (!name)
+		return -ENODEV;
+
+	if (!name->value ||
+	  (strnlen(name->value, name->length) == name->length) ||
+	  (name->length > IMC_MAX_NAME_VAL_LEN))
+		return -EINVAL;
+
+	ev_name = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+	if (!ev_name)
+		return -ENOMEM;
+
+	snprintf(ev_name, IMC_MAX_NAME_VAL_LEN, "%s%s",
+		 (char *)name_prefix->value,
+		 (char *)name->value);
+
+	/*
+	 * Parse each property of this event node "dev". Property "reg" has
+	 * the offset which is assigned to the event name. Other properties
+	 * like "scale" and "unit" are assigned to event.scale and event.unit
+	 * accordingly.
+	 */
+	for_each_property_of_node(dev, pp) {
+		/*
+		 * If there is an issue in parsing a single property of
+		 * this event, we just clean up the buffers, but we still
+		 * continue to parse. TODO: This could be rewritten to skip the
+		 * entire event node incase of parsing issues, but that can be
+		 * done later.
+		 */
+		if (strncmp(pp->name, "reg", 3) == 0) {
+			of_property_read_u32(dev, pp->name, &val);
+			val += reg;
+			ret = imc_event_prop_val(ev_name, val, &events[idx]);
+			if (ret) {
+				if (events[idx].ev_name)
+					kfree(events[idx].ev_name);
+				if (events[idx].ev_value)
+					kfree(events[idx].ev_value);
+				goto fail;
+			}
+			idx++;
+			/*
+			 * If the common scale and unit properties available,
+			 * then, assign them to this event
+			 */
+			if (event_scale) {
+				ret = set_event_property(event_scale, "scale",
+							 &events[idx],
+							 ev_name);
+				if (ret)
+					goto fail;
+				idx++;
+			}
+			if (event_unit) {
+				ret = set_event_property(event_unit, "unit",
+							 &events[idx],
+							 ev_name);
+				if (ret)
+					goto fail;
+				idx++;
+			}
+		} else if (strncmp(pp->name, "unit", 4) == 0) {
+			/*
+			 * The event's unit and scale properties can override the
+			 * PMU's event and scale properties, if present.
+			 */
+			ret = set_event_property(pp, "unit", &events[idx],
+						 ev_name);
+			if (ret)
+				goto fail;
+			idx++;
+		} else if (strncmp(pp->name, "scale", 5) == 0) {
+			ret = set_event_property(pp, "scale", &events[idx],
+						 ev_name);
+			if (ret)
+				goto fail;
+			idx++;
+		}
+	}
+
+	return idx;
+fail:
+	return -EINVAL;
+}
+
+/*
+ * get_nr_children : Returns the number of events(along with scale and unit)
+ * 		     for a pmu device node.
+ */
+static int get_nr_children(struct device_node *pmu_node)
+{
+	struct device_node *child;
+	int i = 0;
+
+	for_each_child_of_node(pmu_node, child)
+		i++;
+	return i;
+}
+
+/*
+ * imc_free_events : Cleanup the "events" list having "nr_entries" entries.
+ */
+static void imc_free_events(struct imc_events *events, int nr_entries)
+{
+	int i;
+
+	/* Nothing to clean, return */
+	if (!events)
+		return;
+
+	for (i = 0; i < nr_entries; i++) {
+		if (events[i].ev_name)
+			kfree(events[i].ev_name);
+		if (events[i].ev_value)
+			kfree(events[i].ev_value);
+	}
+
+	kfree(events);
+}
+
+/*
+ * imc_events_setup() : First finds the event node for the pmu and
+ *                      gets the number of supported events, then
+ * allocates memory for the same and parse the events.
+ */
+static int imc_events_setup(struct device_node *parent,
+					   int pmu_index,
+					   struct imc_pmu *pmu_ptr,
+					   u32 prop,
+					   int *idx)
+{
+	struct device_node *ev_node = NULL, *dir = NULL;
+	u32 reg;
+	struct property *scale_pp, *unit_pp, *name_prefix;
+	int ret = 0, nr_children = 0;
+
+	/*
+	 * Fetch the actual node where the events for this PMU exist.
+	 */
+	dir = of_find_node_by_phandle(prop);
+	if (!dir)
+		return -ENODEV;
+	/*
+	 * Get the maximum no. of events in this node.
+	 * Multiply by 3 to account for .scale and .unit properties
+	 * This number suggests the amount of memory needed to setup the
+	 * events for this pmu.
+	 */
+	nr_children = get_nr_children(dir) * 3;
+
+	pmu_ptr->events = kzalloc((sizeof(struct imc_events) * nr_children),
+			 GFP_KERNEL);
+	if (!pmu_ptr->events)
+		return -ENOMEM;
+
+	/*
+	 * Check if there is a common "scale" and "unit" properties inside
+	 * the PMU node for all the events supported by this PMU.
+	 */
+	scale_pp = of_find_property(parent, "scale", NULL);
+	unit_pp = of_find_property(parent, "unit", NULL);
+
+	/*
+	 * Get the event-prefix property from the PMU node
+	 * which needs to be attached with the event names.
+	 */
+	name_prefix = of_find_property(parent, "events-prefix", NULL);
+	if (!name_prefix)
+		goto free_events;
+
+	/*
+	 * "reg" property gives out the base offset of the counters data
+	 * for this PMU.
+	 */
+	of_property_read_u32(parent, "reg", &reg);
+
+	if (!name_prefix->value ||
+	   (strnlen(name_prefix->value, name_prefix->length) == name_prefix->length) ||
+	   (name_prefix->length > IMC_MAX_NAME_VAL_LEN))
+		goto free_events;
+
+	/* Loop through event nodes */
+	for_each_child_of_node(dir, ev_node) {
+		ret = imc_events_node_parser(ev_node, &pmu_ptr->events[*idx], scale_pp,
+				unit_pp, name_prefix, reg, pmu_ptr->domain);
+		if (ret < 0) {
+			/* Unable to parse this event */
+			if (ret == -ENOMEM)
+				goto free_events;
+			continue;
+		}
+
+		/*
+		 * imc_event_node_parser will return number of
+		 * event entries created for this. This could include
+		 * event scale and unit files also.
+		 */
+		*idx += ret;
+	}
+	return 0;
+
+free_events:
+	imc_free_events(pmu_ptr->events, *idx);
+	return -ENODEV;
+
+}
+
+/* imc_get_mem_addr_nest: Function to get nest counter memory region for each chip */
+static int imc_get_mem_addr_nest(struct device_node *node,
+				 struct imc_pmu *pmu_ptr,
+				 u32 offset)
+{
+	int nr_chips = 0, i, j;
+	u64 *base_addr_arr, baddr;
+	u32 *chipid_arr, size = pmu_ptr->counter_mem_size, pages;
+
+	nr_chips = of_property_count_u32_elems(node, "chip-id");
+	if (!nr_chips)
+		return -ENODEV;
+
+	base_addr_arr = kzalloc((sizeof(u64) * nr_chips), GFP_KERNEL);
+	chipid_arr = kzalloc((sizeof(u32) * nr_chips), GFP_KERNEL);
+	if (!base_addr_arr || !chipid_arr)
+		return -ENOMEM;
+
+	of_property_read_u32_array(node, "chip-id", chipid_arr, nr_chips);
+	of_property_read_u64_array(node, "base-addr", base_addr_arr, nr_chips);
+
+	pmu_ptr->mem_info = kzalloc((sizeof(struct imc_mem_info) * nr_chips), GFP_KERNEL);
+	if (!pmu_ptr->mem_info) {
+		if (base_addr_arr)
+			kfree(base_addr_arr);
+		if (chipid_arr)
+			kfree(chipid_arr);
+
+		return -ENOMEM;
+		}
+
+	for (i = 0; i < nr_chips; i++) {
+		pmu_ptr->mem_info[i].id = chipid_arr[i];
+		baddr = base_addr_arr[i] + offset;
+		for (j = 0; j < (size/PAGE_SIZE); j++) {
+			pages = PAGE_SIZE * j;
+			pmu_ptr->mem_info[i].vbase[j] = phys_to_virt(baddr + pages);
+		}
+	}
+	return 0;
+}
+
+/*
+ * imc_pmu_create : Takes the parent device which is the pmu unit, pmu_index
+ *		    and domain as the inputs.
+ * Allocates memory for the pmu, sets up its domain (NEST), and
+ * calls imc_events_setup() to allocate memory for the events supported
+ * by this pmu. Assigns a name for the pmu.
+ *
+ * If everything goes fine, it calls, init_imc_pmu() to setup the pmu device
+ * and register it.
+ */
+static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
+{
+	u32 prop = 0;
+	struct property *pp;
+	char *buf;
+	int idx = 0, ret = 0;
+	struct imc_pmu *pmu_ptr;
+	u32 offset;
+
+	if (!parent)
+		return -EINVAL;
+
+	/* memory for pmu */
+	pmu_ptr = kzalloc(sizeof(struct imc_pmu), GFP_KERNEL);
+	if (!pmu_ptr)
+		return -ENOMEM;
+
+	pmu_ptr->domain = domain;
+
+	/* Needed for hotplug/migration */
+	per_nest_pmu_arr[pmu_index] = pmu_ptr;
+
+	pp = of_find_property(parent, "name", NULL);
+	if (!pp) {
+		ret = -ENODEV;
+		goto free_pmu;
+	}
+
+	if (!pp->value ||
+	   (strnlen(pp->value, pp->length) == pp->length) ||
+	   (pp->length > IMC_MAX_NAME_VAL_LEN)) {
+		ret = -EINVAL;
+		goto free_pmu;
+	}
+
+	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
+	if (!buf) {
+		ret = -ENOMEM;
+		goto free_pmu;
+	}
+	/* Save the name to register it later */
+	sprintf(buf, "nest_%s", (char *)pp->value);
+	pmu_ptr->pmu.name = (char *)buf;
+
+	if (of_property_read_u32(parent, "size", &pmu_ptr->counter_mem_size))
+		pmu_ptr->counter_mem_size = 0;
+
+	if (!of_property_read_u32(parent, "offset", &offset)) {
+		if (imc_get_mem_addr_nest(parent, pmu_ptr, offset))
+			goto free_pmu;
+		pmu_ptr->imc_counter_mmaped = 1;
+	}
+
+	/*
+	 * "events" property inside a PMU node contains the phandle value
+	 * for the actual events node. The "events" node for the IMC PMU
+	 * is not in this node, rather inside "imc-counters" node, since,
+	 * we want to factor out the common events (thereby, reducing the
+	 * size of the device tree)
+	 */
+	if (!of_property_read_u32(parent, "events", &prop)) {
+		if (prop)
+			imc_events_setup(parent, pmu_index, pmu_ptr, prop, &idx);
+	}
+	return 0;
+
+free_pmu:
+	if (pmu_ptr)
+		kfree(pmu_ptr);
+	return ret;
+}
+
 static int opal_imc_counters_probe(struct platform_device *pdev)
 {
 	struct device_node *imc_dev = NULL;
+	int pmu_count = 0, domain;
+	u32 type;
 
 	if (!pdev || !pdev->dev.of_node)
 		return -ENODEV;
@@ -50,7 +478,16 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
 	imc_dev = pdev->dev.of_node;
 	if (!imc_dev)
 		return -ENODEV;
-
+	for_each_compatible_node(imc_dev, NULL, IMC_DTB_UNIT_COMPAT) {
+		if (of_property_read_u32(imc_dev, "type", &type))
+			continue;
+		if (type == IMC_COUNTER_PER_CHIP)
+			domain = IMC_DOMAIN_NEST;
+		else
+			continue;
+		if (!imc_pmu_create(imc_dev, pmu_count, domain))
+			pmu_count++;
+	}
 	return 0;
 }
 
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v12 04/10] powerpc/perf: Add generic IMC pmu group and event functions
  2017-07-03  9:37 [PATCH v12 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (2 preceding siblings ...)
  2017-07-03  9:37 ` [PATCH v12 03/10] powerpc/powernv: Detect supported IMC units and its events Anju T Sudhakar
@ 2017-07-03  9:37 ` Anju T Sudhakar
  2017-07-03  9:37 ` [PATCH v12 06/10] powerpc/powernv: Core IMC events detection Anju T Sudhakar
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 11+ messages in thread
From: Anju T Sudhakar @ 2017-07-03  9:37 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Device tree IMC driver code parses the IMC units and their events. It
passes the information to IMC pmu code which is placed in powerpc/perf
as "imc-pmu.c".

Patch adds a set of generic imc pmu related event functions to be
used  by each imc pmu unit. Add code to setup format attribute and to
register imc pmus. Add a event_init function for nest_imc events.

Since, the IMC counters' data are periodically fed to a memory location,
the functions to read/update, start/stop, add/del can be generic and can
be used by all IMC PMU units.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |   5 +
 arch/powerpc/perf/Makefile                |   3 +
 arch/powerpc/perf/imc-pmu.c               | 283 ++++++++++++++++++++++++++++++
 arch/powerpc/platforms/powernv/opal-imc.c |  11 +-
 4 files changed, 300 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/perf/imc-pmu.c

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index 2a0239e2590d..25d0c57d14fe 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -63,6 +63,9 @@ struct imc_events {
 #define IMC_CPUMASK_ATTR	1
 #define IMC_EVENT_ATTR		2
 #define IMC_NULL_ATTR		3
+#define IMC_EVENT_OFFSET_MASK	0xffffffffULL
+#define IMC_EVENT_RVALUE_MASK	0x100000000ULL
+#define IMC_NEST_EVENT_MODE	0x1fe00000000ULL
 
 /*
  * Device tree parser code detects IMC pmu support and
@@ -101,4 +104,6 @@ enum {
  */
 #define IMC_DOMAIN_NEST		1
 
+extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern int init_imc_pmu(struct imc_events *events, int idx, struct imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 4d606b99a5cb..b29d918814d3 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -6,6 +6,9 @@ obj-$(CONFIG_PPC_PERF_CTRS)	+= core-book3s.o bhrb.o
 obj64-$(CONFIG_PPC_PERF_CTRS)	+= power4-pmu.o ppc970-pmu.o power5-pmu.o \
 				   power5+-pmu.o power6-pmu.o power7-pmu.o \
 				   isa207-common.o power8-pmu.o power9-pmu.o
+
+obj-$(CONFIG_HV_PERF_IMC_CTRS)	+= imc-pmu.o
+
 obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
 
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
new file mode 100644
index 000000000000..4e2f837b8bb7
--- /dev/null
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -0,0 +1,283 @@
+/*
+ * Nest Performance Monitor counter support.
+ *
+ * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
+ *           (C) 2017 Anju T Sudhakar, IBM Corporation.
+ *           (C) 2017 Hemant K Shaw, IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or later version.
+ */
+#include <linux/perf_event.h>
+#include <linux/slab.h>
+#include <asm/opal.h>
+#include <asm/imc-pmu.h>
+#include <asm/cputhreads.h>
+#include <asm/smp.h>
+#include <linux/string.h>
+
+/* Needed for sanity check */
+struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+
+struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
+{
+	return container_of(event->pmu, struct imc_pmu, pmu);
+}
+
+PMU_FORMAT_ATTR(event, "config:0-47");
+PMU_FORMAT_ATTR(offset, "config:0-31");
+PMU_FORMAT_ATTR(rvalue, "config:32");
+PMU_FORMAT_ATTR(mode, "config:33-40");
+static struct attribute *nest_imc_format_attrs[] = {
+	&format_attr_event.attr,
+	&format_attr_offset.attr,
+	&format_attr_rvalue.attr,
+	&format_attr_mode.attr,
+	NULL,
+};
+
+static struct attribute_group imc_format_group = {
+	.name = "format",
+	.attrs = nest_imc_format_attrs,
+};
+
+static int nest_imc_event_init(struct perf_event *event)
+{
+	int chip_id;
+	u32 l_config, config = event->attr.config;
+	struct imc_mem_info *pcni;
+	struct imc_pmu *pmu;
+	bool flag = false;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Sampling not supported */
+	if (event->hw.sample_period)
+		return -EINVAL;
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest)
+		return -EINVAL;
+
+	if (event->cpu < 0)
+		return -EINVAL;
+
+	pmu = imc_event_to_pmu(event);
+
+	/*
+	 * Sanity check for config (event offset, mode and rvalue).
+	 * mode and rvalue should be zero, if not just return.
+	 */
+	if (((config & IMC_EVENT_OFFSET_MASK) > pmu->counter_mem_size) ||
+	    ((config & IMC_EVENT_RVALUE_MASK) != 0) ||
+	    ((config & IMC_NEST_EVENT_MODE) != 0))
+		return -EINVAL;
+
+	chip_id = topology_physical_package_id(event->cpu);
+	pcni = pmu->mem_info;
+	do {
+		if (pcni->id == chip_id) {
+			flag = true;
+			break;
+		}
+		pcni++;
+	} while (pcni);
+
+	if (!flag)
+		return -ENODEV;
+
+	/*
+	 * Memory for Nest HW counter data could be in multiple pages.
+	 * Hence check and pick the right event base page for chip with
+	 * "chip_id" and add "config" to it".
+	 */
+	l_config = config & IMC_EVENT_OFFSET_MASK;
+	event->hw.event_base = (u64)pcni->vbase[l_config/PAGE_SIZE] +
+			       (l_config & ~PAGE_MASK);
+	return 0;
+}
+
+static void imc_read_counter(struct perf_event *event)
+{
+	u64 *addr, data;
+
+	/*
+	 * In-Memory Collection (IMC) counters are free flowing counters.
+	 * So we take a snapshot of the counter value on enable and save it
+	 * to calculate the delta at later stage to present the event counter
+	 * value.
+	 */
+	addr = (u64 *)event->hw.event_base;
+	data = __be64_to_cpu(READ_ONCE(*addr));
+	local64_set(&event->hw.prev_count, data);
+}
+
+static void imc_perf_event_update(struct perf_event *event)
+{
+	u64 counter_prev, counter_new, final_count, *addr;
+
+	addr = (u64 *)event->hw.event_base;
+	counter_prev = local64_read(&event->hw.prev_count);
+	counter_new = __be64_to_cpu(READ_ONCE(*addr));
+	final_count = counter_new - counter_prev;
+
+	/*
+	 * Need to update prev_count is that, counter could be
+	 * read in a periodic interval from the tool side.
+	 */
+	local64_set(&event->hw.prev_count, counter_new);
+	/* Update the delta to the event count */
+	local64_add(final_count, &event->count);
+}
+
+static void imc_event_start(struct perf_event *event, int flags)
+{
+	/*
+	 * In Memory Counters are free flowing counters. HW or the microcode
+	 * keeps adding to the counter offset in memory. To get event
+	 * counter value, we snapshot the value here and we calculate
+	 * delta at later point.
+	 */
+	imc_read_counter(event);
+}
+
+static void imc_event_stop(struct perf_event *event, int flags)
+{
+	/*
+	 * Take a snapshot and calculate the delta and update
+	 * the event counter values.
+	 */
+	imc_perf_event_update(event);
+}
+
+static int imc_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		imc_event_start(event, flags);
+
+	return 0;
+}
+
+/* update_pmu_ops : Populate the appropriate operations for "pmu" */
+static int update_pmu_ops(struct imc_pmu *pmu)
+{
+	if (!pmu)
+		return -EINVAL;
+
+	pmu->pmu.task_ctx_nr = perf_invalid_context;
+	pmu->pmu.event_init = nest_imc_event_init;
+	pmu->pmu.add = imc_event_add;
+	pmu->pmu.del = imc_event_stop;
+	pmu->pmu.start = imc_event_start;
+	pmu->pmu.stop = imc_event_stop;
+	pmu->pmu.read = imc_perf_event_update;
+	pmu->attr_groups[IMC_FORMAT_ATTR] = &imc_format_group;
+	pmu->pmu.attr_groups = pmu->attr_groups;
+
+	return 0;
+}
+
+/* dev_str_attr : Populate event "name" and string "str" in attribute */
+static struct attribute *dev_str_attr(const char *name, const char *str)
+{
+	struct perf_pmu_events_attr *attr;
+
+	attr = kzalloc(sizeof(*attr), GFP_KERNEL);
+	if (!attr)
+		return NULL;
+	sysfs_attr_init(&attr->attr.attr);
+
+	attr->event_str = str;
+	attr->attr.attr.name = name;
+	attr->attr.attr.mode = 0444;
+	attr->attr.show = perf_event_sysfs_show;
+
+	return &attr->attr.attr;
+}
+
+/*
+ * update_events_in_group: Update the "events" information in an attr_group
+ *                         and assign the attr_group to the pmu "pmu".
+ */
+static int update_events_in_group(struct imc_events *events,
+				  int idx, struct imc_pmu *pmu)
+{
+	struct attribute_group *attr_group;
+	struct attribute **attrs;
+	int i;
+
+	/* If there is no events for this pmu, just return zero */
+	if (!events)
+		return 0;
+
+	/* Allocate memory for attribute group */
+	attr_group = kzalloc(sizeof(*attr_group), GFP_KERNEL);
+	if (!attr_group)
+		return -ENOMEM;
+
+	/* Allocate memory for attributes */
+	attrs = kzalloc((sizeof(struct attribute *) * (idx + 1)), GFP_KERNEL);
+	if (!attrs) {
+		kfree(attr_group);
+		return -ENOMEM;
+	}
+
+	attr_group->name = "events";
+	attr_group->attrs = attrs;
+	for (i = 0; i < idx; i++, events++) {
+		attrs[i] = dev_str_attr((char *)events->ev_name,
+					(char *)events->ev_value);
+	}
+
+	/* Save the event attribute */
+	pmu->attr_groups[IMC_EVENT_ATTR] = attr_group;
+	return 0;
+}
+
+/*
+ * init_imc_pmu : Setup and register the IMC pmu device.
+ *
+ * @events:	events memory for this pmu.
+ * @idx:	number of event entries created.
+ * @pmu_ptr:	memory allocated for this pmu.
+ */
+int init_imc_pmu(struct imc_events *events, int idx,
+		 struct imc_pmu *pmu_ptr)
+{
+	int ret;
+
+	ret = update_events_in_group(events, idx, pmu_ptr);
+	if (ret)
+		goto err_free;
+
+	ret = update_pmu_ops(pmu_ptr);
+	if (ret)
+		goto err_free;
+
+	ret = perf_pmu_register(&pmu_ptr->pmu, pmu_ptr->pmu.name, -1);
+	if (ret)
+		goto err_free;
+
+	pr_info("%s performance monitor hardware support registered\n",
+		pmu_ptr->pmu.name);
+
+	return 0;
+
+err_free:
+	/* Only free the attr_groups which are dynamically allocated  */
+	if (pmu_ptr->attr_groups[IMC_EVENT_ATTR]) {
+		if (pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs)
+			kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]->attrs);
+		kfree(pmu_ptr->attr_groups[IMC_EVENT_ATTR]);
+	}
+
+	return ret;
+}
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 839c25718110..a68d66d1ddb1 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -34,8 +34,6 @@
 #include <asm/cputable.h>
 #include <asm/imc-pmu.h>
 
-struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
-
 static int imc_event_prop_update(char *name, struct imc_events *events)
 {
 	char *buf;
@@ -452,8 +450,17 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
 		if (prop)
 			imc_events_setup(parent, pmu_index, pmu_ptr, prop, &idx);
 	}
+	/* Function to register IMC pmu */
+	ret = init_imc_pmu(pmu_ptr->events, idx, pmu_ptr);
+	if (ret) {
+		pr_err("IMC PMU %s Register failed\n", pmu_ptr->pmu.name);
+		goto free_events;
+	}
 	return 0;
 
+free_events:
+	if (pmu_ptr->events)
+		imc_free_events(pmu_ptr->events, idx);
 free_pmu:
 	if (pmu_ptr)
 		kfree(pmu_ptr);
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v12 06/10] powerpc/powernv: Core IMC events detection
  2017-07-03  9:37 [PATCH v12 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (3 preceding siblings ...)
  2017-07-03  9:37 ` [PATCH v12 04/10] powerpc/perf: Add generic IMC pmu group and event functions Anju T Sudhakar
@ 2017-07-03  9:37 ` Anju T Sudhakar
  2017-07-03  9:37 ` [PATCH v12 08/10] powerpc/powernv: Thread " Anju T Sudhakar
  2017-07-03  9:37 ` [PATCH v12 09/10] powerpc/perf: Thread IMC PMU functions Anju T Sudhakar
  6 siblings, 0 replies; 11+ messages in thread
From: Anju T Sudhakar @ 2017-07-03  9:37 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

From: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>

This patch adds support for detection of core IMC events along with the
Nest IMC events. It adds a new domain IMC_DOMAIN_CORE and its determined
with the help of the "type" property in the IMC device tree.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |  3 +++
 arch/powerpc/perf/imc-pmu.c               |  2 ++
 arch/powerpc/platforms/powernv/opal-imc.c | 14 +++++++++++---
 3 files changed, 16 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index aeed903b2a79..24a6112ca0b5 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -107,6 +107,7 @@ struct imc_pmu_ref {
 
 /* In-Memory Collection Counters Type */
 enum {
+	IMC_COUNTER_PER_CORE		= 0x4,
 	IMC_COUNTER_PER_CHIP            = 0x10,
 };
 
@@ -114,7 +115,9 @@ enum {
  * Domains for IMC PMUs
  */
 #define IMC_DOMAIN_NEST		1
+#define IMC_DOMAIN_CORE		2
 
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
+extern struct imc_pmu *core_imc_pmu;
 extern int init_imc_pmu(struct imc_events *events, int idx, struct imc_pmu *pmu_ptr);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index ca9662bea7d6..041d3097d42a 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -31,6 +31,8 @@ static DEFINE_MUTEX(imc_nest_inited_reserve);
 
 struct imc_pmu_ref *nest_imc_refc;
 
+struct imc_pmu *core_imc_pmu;
+
 struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
 	return container_of(event->pmu, struct imc_pmu, pmu);
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 406f7c10850a..aeef59b66420 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -379,7 +379,7 @@ static int imc_get_mem_addr_nest(struct device_node *node,
 /*
  * imc_pmu_create : Takes the parent device which is the pmu unit, pmu_index
  *		    and domain as the inputs.
- * Allocates memory for the pmu, sets up its domain (NEST), and
+ * Allocates memory for the pmu, sets up its domain (NEST/CORE), and
  * calls imc_events_setup() to allocate memory for the events supported
  * by this pmu. Assigns a name for the pmu.
  *
@@ -406,7 +406,10 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
 	pmu_ptr->domain = domain;
 
 	/* Needed for hotplug/migration */
-	per_nest_pmu_arr[pmu_index] = pmu_ptr;
+	if (pmu_ptr->domain == IMC_DOMAIN_CORE)
+		core_imc_pmu = pmu_ptr;
+	else if (pmu_ptr->domain == IMC_DOMAIN_NEST)
+		per_nest_pmu_arr[pmu_index] = pmu_ptr;
 
 	pp = of_find_property(parent, "name", NULL);
 	if (!pp) {
@@ -427,7 +430,10 @@ static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
 		goto free_pmu;
 	}
 	/* Save the name to register it later */
-	sprintf(buf, "nest_%s", (char *)pp->value);
+	if (pmu_ptr->domain == IMC_DOMAIN_NEST)
+		sprintf(buf, "nest_%s", (char *)pp->value);
+	else
+		sprintf(buf, "%s_imc", (char *)pp->value);
 	pmu_ptr->pmu.name = (char *)buf;
 
 	if (of_property_read_u32(parent, "size", &pmu_ptr->counter_mem_size))
@@ -505,6 +511,8 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
 			continue;
 		if (type == IMC_COUNTER_PER_CHIP)
 			domain = IMC_DOMAIN_NEST;
+		else if (type == IMC_COUNTER_PER_CORE)
+			domain = IMC_DOMAIN_CORE;
 		else
 			continue;
 		if (!imc_pmu_create(imc_dev, pmu_count, domain))
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v12 08/10] powerpc/powernv: Thread IMC events detection
  2017-07-03  9:37 [PATCH v12 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (4 preceding siblings ...)
  2017-07-03  9:37 ` [PATCH v12 06/10] powerpc/powernv: Core IMC events detection Anju T Sudhakar
@ 2017-07-03  9:37 ` Anju T Sudhakar
  2017-07-03  9:37 ` [PATCH v12 09/10] powerpc/perf: Thread IMC PMU functions Anju T Sudhakar
  6 siblings, 0 replies; 11+ messages in thread
From: Anju T Sudhakar @ 2017-07-03  9:37 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Code to add support for detection of thread IMC events. It adds a new
domain IMC_DOMAIN_THREAD and it is determined with the help of the
"type" property in the imc device-tree.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        | 2 ++
 arch/powerpc/platforms/powernv/opal-imc.c | 4 +++-
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index 24a6112ca0b5..e71e0d77d1d7 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -107,6 +107,7 @@ struct imc_pmu_ref {
 
 /* In-Memory Collection Counters Type */
 enum {
+	IMC_COUNTER_PER_THREAD		= 0x1,
 	IMC_COUNTER_PER_CORE		= 0x4,
 	IMC_COUNTER_PER_CHIP            = 0x10,
 };
@@ -116,6 +117,7 @@ enum {
  */
 #define IMC_DOMAIN_NEST		1
 #define IMC_DOMAIN_CORE		2
+#define IMC_DOMAIN_THREAD	3
 
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 91b8dd8d7619..2f857ec826e6 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -380,7 +380,7 @@ static int imc_get_mem_addr_nest(struct device_node *node,
 /*
  * imc_pmu_create : Takes the parent device which is the pmu unit, pmu_index
  *		    and domain as the inputs.
- * Allocates memory for the pmu, sets up its domain (NEST/CORE), and
+ * Allocates memory for the pmu, sets up its domain (NEST/CORE/THREAD), and
  * calls imc_events_setup() to allocate memory for the events supported
  * by this pmu. Assigns a name for the pmu.
  *
@@ -531,6 +531,8 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
 			domain = IMC_DOMAIN_NEST;
 		else if (type == IMC_COUNTER_PER_CORE)
 			domain = IMC_DOMAIN_CORE;
+		else if (type == IMC_COUNTER_PER_THREAD)
+			domain = IMC_DOMAIN_THREAD;
 		else
 			continue;
 		if (!imc_pmu_create(imc_dev, pmu_count, domain))
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* [PATCH v12 09/10] powerpc/perf: Thread IMC PMU functions
  2017-07-03  9:37 [PATCH v12 00/10] IMC Instrumentation Support Anju T Sudhakar
                   ` (5 preceding siblings ...)
  2017-07-03  9:37 ` [PATCH v12 08/10] powerpc/powernv: Thread " Anju T Sudhakar
@ 2017-07-03  9:37 ` Anju T Sudhakar
  6 siblings, 0 replies; 11+ messages in thread
From: Anju T Sudhakar @ 2017-07-03  9:37 UTC (permalink / raw)
  To: mpe
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Code to add PMU functions required for event initialization,
read, update, add, del etc. for thread IMC PMU. Thread IMC PMUs are used
for per-task monitoring.

For each CPU, a page of memory is allocated and is kept static i.e.,
these pages will exist till the machine shuts down. The base address of
this page is assigned to the ldbar of that cpu. As soon as we do that,
the thread IMC counters start running for that cpu and the data of these
counters are assigned to the page allocated. But we use this for
per-task monitoring. Whenever we start monitoring a task, the event is
added is onto the task. At that point, we read the initial value of the
event. Whenever, we stop monitoring the task, the final value is taken
and the difference is the event data.

Now, a task can move to a different cpu. Suppose a task X is moving from
cpu A to cpu B. When the task is scheduled out of A, we get an
event_del for A, and hence, the event data is updated. And, we stop
updating the X's event data. As soon as X moves on to B, event_add is
called for B, and we again update the event_data. And this is how it
keeps on updating the event data even when the task is scheduled on to
different cpus.

Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/imc-pmu.h        |   4 +
 arch/powerpc/perf/imc-pmu.c               | 241 ++++++++++++++++++++++++++++--
 arch/powerpc/platforms/powernv/opal-imc.c |   2 +
 3 files changed, 238 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
index e71e0d77d1d7..470301ac806b 100644
--- a/arch/powerpc/include/asm/imc-pmu.h
+++ b/arch/powerpc/include/asm/imc-pmu.h
@@ -44,6 +44,9 @@
 #define IMC_DTB_COMPAT			"ibm,opal-in-memory-counters"
 #define IMC_DTB_UNIT_COMPAT		"ibm,imc-counters"
 
+#define THREAD_IMC_LDBAR_MASK           0x0003ffffffffe000ULL
+#define THREAD_IMC_ENABLE               0x8000000000000000ULL
+
 /*
  * Structure to hold memory address information for imc units.
  */
@@ -122,4 +125,5 @@ enum {
 extern struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 extern struct imc_pmu *core_imc_pmu;
 extern int init_imc_pmu(struct imc_events *events, int idx, struct imc_pmu *pmu_ptr);
+void thread_imc_disable(void);
 #endif /* PPC_POWERNV_IMC_PMU_DEF_H */
diff --git a/arch/powerpc/perf/imc-pmu.c b/arch/powerpc/perf/imc-pmu.c
index c1a275ed2510..bea4dafc2aad 100644
--- a/arch/powerpc/perf/imc-pmu.c
+++ b/arch/powerpc/perf/imc-pmu.c
@@ -18,6 +18,9 @@
 #include <asm/smp.h>
 #include <linux/string.h>
 
+/* Maintains base address for all the cpus */
+static DEFINE_PER_CPU(u64 *, thread_imc_mem);
+
 /* Needed for sanity check */
 struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
 static cpumask_t nest_imc_cpumask;
@@ -33,6 +36,7 @@ static DEFINE_MUTEX(imc_nest_inited_reserve);
 struct imc_pmu_ref *nest_imc_refc;
 struct imc_pmu_ref *core_imc_refc;
 struct imc_pmu *core_imc_pmu;
+static int thread_imc_mem_size;
 
 struct imc_pmu *imc_event_to_pmu(struct perf_event *event)
 {
@@ -568,6 +572,137 @@ static int core_imc_event_init(struct perf_event *event)
 	return 0;
 }
 
+static int thread_imc_event_init(struct perf_event *event)
+{
+	int rc, core_id;
+	u32 config = event->attr.config;
+	struct task_struct *target;
+	struct imc_pmu *pmu;
+	struct imc_pmu_ref *ref;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Sampling not supported */
+	if (event->hw.sample_period)
+		return -EINVAL;
+
+	event->hw.idx = -1;
+	pmu = imc_event_to_pmu(event);
+	core_id = event->cpu / threads_per_core;
+
+	/* Sanity check for config (event offset and rvalue) */
+	if (((config & IMC_EVENT_OFFSET_MASK) > pmu->counter_mem_size) ||
+	    ((config & IMC_EVENT_RVALUE_MASK) != 0))
+		return -EINVAL;
+
+	target = event->hw.target;
+	if (!target)
+		return -EINVAL;
+
+	if (!is_core_imc_mem_inited(event->cpu))
+		return -ENODEV;
+
+	event->pmu->task_ctx_nr = perf_sw_context;
+	core_id = event->cpu / threads_per_core;
+
+	/*
+	 * Core pmu units are enabled only when it is used.
+	 * See if this is triggered for the first time.
+	 * If yes, take the mutex lock and enable the core counters.
+	 * If not, just increment the count in core_imc_refc struct.
+	 */
+	ref = &core_imc_refc[core_id];
+	if (!ref)
+		return -EINVAL;
+
+	mutex_lock(&ref->lock);
+	if (ref->refc == 0) {
+		rc = opal_imc_counters_start(OPAL_IMC_COUNTERS_CORE,
+					     get_hard_smp_processor_id(event->cpu));
+		if (rc) {
+			mutex_unlock(&ref->lock);
+			pr_err("IMC: Unable to start the counters for core %d\n", core_id);
+			return rc;
+		}
+	}
+	++ref->refc;
+	mutex_unlock(&ref->lock);
+
+	event->destroy = core_imc_counters_release;
+	return 0;
+}
+
+static void thread_imc_read_counter(struct perf_event *event)
+{
+	u64 *addr, data;
+
+	addr = per_cpu(thread_imc_mem, smp_processor_id()) +
+		(event->attr.config & IMC_EVENT_OFFSET_MASK);
+	data = __be64_to_cpu(READ_ONCE(*addr));
+	local64_set(&event->hw.prev_count, data);
+}
+
+static void thread_imc_perf_event_update(struct perf_event *event)
+{
+	u64 counter_prev, counter_new, final_count, *addr;
+
+	addr = per_cpu(thread_imc_mem, smp_processor_id()) +
+		(event->attr.config & IMC_EVENT_OFFSET_MASK);
+	counter_prev = local64_read(&event->hw.prev_count);
+	counter_new = __be64_to_cpu(READ_ONCE(*addr));
+	final_count = counter_new - counter_prev;
+
+	local64_set(&event->hw.prev_count, counter_new);
+	local64_add(final_count, &event->count);
+}
+
+static void thread_imc_event_start(struct perf_event *event, int flags)
+{
+	thread_imc_read_counter(event);
+}
+
+static void thread_imc_event_stop(struct perf_event *event, int flags)
+{
+	thread_imc_perf_event_update(event);
+}
+
+static void thread_imc_event_del(struct perf_event *event, int flags)
+{
+	thread_imc_perf_event_update(event);
+}
+
+static int thread_imc_event_add(struct perf_event *event, int flags)
+{
+	thread_imc_event_start(event, flags);
+	return 0;
+}
+
+static void thread_imc_pmu_start_txn(struct pmu *pmu,
+				     unsigned int txn_flags)
+{
+	if (txn_flags & ~PERF_PMU_TXN_ADD)
+		return;
+	perf_pmu_disable(pmu);
+}
+
+static void thread_imc_pmu_cancel_txn(struct pmu *pmu)
+{
+	perf_pmu_enable(pmu);
+}
+
+static int thread_imc_pmu_commit_txn(struct pmu *pmu)
+{
+	perf_pmu_enable(pmu);
+	return 0;
+}
+
+static void thread_imc_pmu_sched_task(struct perf_event_context *ctx,
+				      bool sched_in)
+{
+	return;
+}
+
 static void imc_read_counter(struct perf_event *event)
 {
 	u64 *addr, data;
@@ -650,7 +785,27 @@ static int update_pmu_ops(struct imc_pmu *pmu)
 	pmu->pmu.read = imc_perf_event_update;
 	pmu->attr_groups[IMC_CPUMASK_ATTR] = &imc_pmu_cpumask_attr_group;
 	pmu->pmu.attr_groups = pmu->attr_groups;
+	if (pmu->domain == IMC_DOMAIN_THREAD) {
+		pmu->pmu.event_init = thread_imc_event_init;
+		pmu->pmu.start = thread_imc_event_start;
+		pmu->pmu.add = thread_imc_event_add;
+		pmu->pmu.del = thread_imc_event_del;
+		pmu->pmu.stop = thread_imc_event_stop;
+		pmu->pmu.read = thread_imc_perf_event_update;
+		pmu->pmu.start_txn = thread_imc_pmu_start_txn;
+		pmu->pmu.cancel_txn = thread_imc_pmu_cancel_txn;
+		pmu->pmu.commit_txn = thread_imc_pmu_commit_txn;
+		pmu->pmu.sched_task = thread_imc_pmu_sched_task;
+		pmu->attr_groups[IMC_FORMAT_ATTR] = &core_imc_format_group;
 
+		/*
+		 * Since thread_imc does not have any CPUMASK attr,
+		 * this may drop the "events" attr all together.
+		 * So swap the IMC_EVENT_ATTR slot with IMC_CPUMASK_ATTR.
+		 */
+		pmu->attr_groups[IMC_CPUMASK_ATTR] = pmu->attr_groups[IMC_EVENT_ATTR];
+		pmu->attr_groups[IMC_EVENT_ATTR] = NULL;
+	}
 	return 0;
 }
 
@@ -748,28 +903,93 @@ static void cleanup_all_core_imc_memory(struct imc_pmu *pmu_ptr)
 }
 
 /*
+ * Allocates a page of memory for each of the online cpus, and, writes the
+ * physical base address of that page to the LDBAR for that cpu. This starts
+ * the thread IMC counters.
+ */
+static int thread_imc_mem_alloc(int cpu_id, int size)
+{
+	u64 ldbar_value, *local_mem;
+	int phys_id = topology_physical_package_id(cpu_id);
+
+	if (per_cpu(thread_imc_mem, cpu_id) != NULL)
+		return 0;
+
+	local_mem =  page_address(alloc_pages_node(phys_id,
+				  GFP_KERNEL | __GFP_ZERO | __GFP_THISNODE,
+				  get_order(size)));
+
+	if (!local_mem)
+		return -ENOMEM;
+
+	per_cpu(thread_imc_mem, cpu_id) = local_mem;
+
+	ldbar_value = ((u64)local_mem & (u64)THREAD_IMC_LDBAR_MASK) |
+	(u64)THREAD_IMC_ENABLE;
+
+	mtspr(SPRN_LDBAR, ldbar_value);
+	return 0;
+}
+
+/*
  * imc_mem_init : Function to support memory allocation for core imc.
  */
 static int imc_mem_init(struct imc_pmu *pmu_ptr)
 {
-	int nr_cores;
+	int nr_cores, res, cpu;
 
 	if (pmu_ptr->imc_counter_mmaped)
 		return 0;
 
-	nr_cores = num_present_cpus() / threads_per_core;
-	pmu_ptr->mem_info = kzalloc((sizeof(struct imc_mem_info) * nr_cores), GFP_KERNEL);
-	if (!pmu_ptr->mem_info)
-		return -ENOMEM;
+	switch (pmu_ptr->domain) {
+	case IMC_DOMAIN_CORE:
+		nr_cores = num_present_cpus() / threads_per_core;
+		pmu_ptr->mem_info = kzalloc((sizeof(struct imc_mem_info) * nr_cores), GFP_KERNEL);
+		if (!pmu_ptr->mem_info)
+			return -ENOMEM;
 
-	core_imc_refc = kzalloc((sizeof(struct imc_pmu_ref) * nr_cores),
-				 GFP_KERNEL);
-	if (!core_imc_refc)
-		return -ENOMEM;
+		core_imc_refc = kzalloc((sizeof(struct imc_pmu_ref) * nr_cores),
+					 GFP_KERNEL);
+		if (!core_imc_refc)
+			return -ENOMEM;
+
+		break;
+	case IMC_DOMAIN_THREAD:
+		thread_imc_mem_size = pmu_ptr->counter_mem_size;
+		for_each_online_cpu(cpu) {
+			res = thread_imc_mem_alloc(cpu, pmu_ptr->counter_mem_size);
+			if (res)
+				return res;
+		}
+		break;
+	default:
+		return -EINVAL;
+	}
 
 	return 0;
 }
 
+static void thread_imc_ldbar_disable(void *dummy)
+{
+	/* LDBAR spr is a per-thread */
+	mtspr(SPRN_LDBAR, 0);
+}
+
+void thread_imc_disable(void)
+{
+	on_each_cpu(thread_imc_ldbar_disable, NULL, 1);
+}
+
+static void cleanup_all_thread_imc_memory(void)
+{
+	int i;
+
+	for_each_online_cpu(i) {
+		if (per_cpu(thread_imc_mem, i))
+			free_pages((u64)per_cpu(thread_imc_mem, i), 0);
+	}
+}
+
 /*
  * init_imc_pmu : Setup and register the IMC pmu device.
  *
@@ -874,5 +1094,8 @@ int init_imc_pmu(struct imc_events *events, int idx,
 		cpuhp_remove_state(CPUHP_AP_PERF_POWERPC_CORE_IMC_ONLINE);
 		cleanup_all_core_imc_memory(pmu_ptr);
 	}
+	/* For thread_imc, we have allocated memory, we need to free it */
+	if (pmu_ptr->domain == IMC_DOMAIN_THREAD)
+		cleanup_all_thread_imc_memory();
 	return ret;
 }
diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
index 2f857ec826e6..ccb7060bdc18 100644
--- a/arch/powerpc/platforms/powernv/opal-imc.c
+++ b/arch/powerpc/platforms/powernv/opal-imc.c
@@ -545,6 +545,8 @@ static void opal_imc_counters_shutdown(struct platform_device *pdev)
 {
 	/* Disable the IMC Core functions */
 	disable_core_pmu_counters();
+	/* Disable the IMC Thread functions */
+	thread_imc_disable();
 }
 
 static const struct of_device_id opal_imc_match[] = {
-- 
2.11.0

^ permalink raw reply related	[flat|nested] 11+ messages in thread

* Re: [PATCH v12 03/10] powerpc/powernv: Detect supported IMC units and its events
  2017-07-03  9:37 ` [PATCH v12 03/10] powerpc/powernv: Detect supported IMC units and its events Anju T Sudhakar
@ 2017-07-06 13:48   ` Michael Ellerman
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Ellerman @ 2017-07-06 13:48 UTC (permalink / raw)
  To: Anju T Sudhakar
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Hi Maddy/Anju,

Comments inline ...

Anju T Sudhakar <anju@linux.vnet.ibm.com> writes:
> From: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
>
> Parse device tree to detect IMC units. Traverse through each IMC unit
> node to find supported events and corresponding unit/scale files (if any).
>
> The device tree for IMC counters starts at the node "imc-counters".
> This node contains all the IMC PMU nodes and event nodes
> for these IMC PMUs. The PMU nodes have an "events" property which has a
> phandle value for the actual events node. The events are separated from
> the PMU nodes to abstract out the common events. For example, PMU node
> "mcs0", "mcs1" etc. will contain a pointer to "nest-mcs-events" since,
> the events are common between these PMUs. These events have a different
> prefix based on their relation to different PMUs, and hence, the PMU
> nodes themselves contain an "events-prefix" property. The value for this
> property concatenated to the event name, forms the actual event
> name. Also, the PMU have a "reg" field as the base offset for the events
> which belong to this PMU. This "reg" field is added to event's "reg" field
> in the "events" node, which gives us the location of the counter data. Kernel
> code uses this offset as event configuration value.
>
> Device tree parser code also looks for scale/unit property in the event
> node and passes on the value as an event attr for perf interface to use
> in the post processing by the perf tool. Some PMUs may have common scale
> and unit properties which implies that all events supported by this PMU
> inherit the scale and unit properties of the PMU itself. For those
> events, we need to set the common unit and scale values.
>
> For failure to initialize any unit or any event, disable that unit and
> continue setting up the rest of them.
>
> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/imc-pmu.h        |   5 +
>  arch/powerpc/platforms/powernv/opal-imc.c | 439 +++++++++++++++++++++++++++++-

Bit of a meta comment. I feel like the split between opal-imc.c and
imc-pmu.c is not helping the code much.

We end up with the imc_pmu structure in a header when it really should
be private to imc-pmu.c, and we have details of perf in opal-imc.c

I haven't quite reviewed everything enough to say for certain that all
the code should be in imc-pmu.c, but that's the way I'm thinking ATM.

> diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
> index ffaea0b9c13e..2a0239e2590d 100644
> --- a/arch/powerpc/include/asm/imc-pmu.h
> +++ b/arch/powerpc/include/asm/imc-pmu.h
> @@ -91,6 +91,11 @@ struct imc_pmu {
>  	const struct attribute_group *attr_groups[4];
>  };
>  
> +/* In-Memory Collection Counters Type */
> +enum {
> +	IMC_COUNTER_PER_CHIP            = 0x10,
> +};

Who decides on that value? It looks like it comes from the device tree,
so this should at least have a comment explaining that. It should
probably be called IMC_TYPE_CHIP or something though to make it more
obvious that it's one of the legal values for a "type" property.

> diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
> index 5b1045c81af4..839c25718110 100644
> --- a/arch/powerpc/platforms/powernv/opal-imc.c
> +++ b/arch/powerpc/platforms/powernv/opal-imc.c
> @@ -34,9 +34,437 @@
>  #include <asm/cputable.h>
>  #include <asm/imc-pmu.h>
>  
> +struct imc_pmu *per_nest_pmu_arr[IMC_MAX_PMUS];
> +
> +static int imc_event_prop_update(char *name, struct imc_events *events)
> +{
> +	char *buf;
> +
> +	if (!events || !name)
> +		return -EINVAL;

Unless there's a reason to expect that to happen in normal operation we
usually avoid explicit NULL checks everywhere. The CPU will catch them
for you.

> +
> +	/* memory for content */
> +	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);

We shouldn't need to allocate the maximum. The caller knows how much
space they'll need.

> +	if (!buf)
> +		return -ENOMEM;
> +
> +	events->ev_name = name;
> +	events->ev_value = buf;
> +	return 0;
> +}
> +
> +static int imc_event_prop_str(struct property *pp, char *name,
> +			      struct imc_events *events)
> +{
> +	int ret;
> +
> +	ret = imc_event_prop_update(name, events);
> +	if (ret)
> +		return ret;
> +
> +	if (!pp->value || (strnlen(pp->value, pp->length) == pp->length) ||
> +	   (pp->length > IMC_MAX_NAME_VAL_LEN))
> +		return -EINVAL;
> +	strncpy(events->ev_value, (const char *)pp->value, pp->length);

We shouldn't be passing struct property around and doing these strnlen()
etc. checks everywhere.

There are device tree helpers for reading strings.

> +
> +	return 0;
> +}
> +
> +static int imc_event_prop_val(char *name, u32 val,
> +			      struct imc_events *events)
> +{
> +	int ret;
> +
> +	ret = imc_event_prop_update(name, events);
> +	if (ret)
> +		return ret;
> +	snprintf(events->ev_value, IMC_MAX_NAME_VAL_LEN, "event=0x%x", val);

kasprintf() is what you want here.

Though this funtion only has one caller so should probably just go away.

But, the syntax here "event=0x%x" is dictated by the perf attribute
code, so that's a detail that should not be in this file.

Either the parsing should happen in the PMU code (probably), or this
code should just give a list of strings to the PMU code and then it
should do things like adding the "event=" prefix when it create the
attributes.

> +
> +	return 0;
> +}
> +
> +static int set_event_property(struct property *pp, char *event_prop,
> +			      struct imc_events *events, char *ev_name)
> +{
> +	char *buf;
> +	int ret;

Again too much passing around of struct property.

Ideally we should never use struct property, and instead use the device
tree helpers to extract the values we need immediately.

> +	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);

Over large allocation.

> +	if (!buf)
> +		return -ENOMEM;
> +
> +	sprintf(buf, "%s.%s", ev_name, event_prop);

That could overflow the buffer AFAICS, but again use kasprintf().

Also this syntax comes from the PMU code too, right?

> +	ret = imc_event_prop_str(pp, buf, events);
> +	if (ret) {
> +		if (events->ev_name)
> +			kfree(events->ev_name);
> +		if (events->ev_value)
> +			kfree(events->ev_value);

You don't need to check before calling kfree.

So just:
                kfree(events->ev_name);
                kfree(events->ev_value);

Having said that, this doesn't seem like the right place to be freeing
those. Leave it to the caller.

> +	}
> +	return ret;
> +}
> +
> +/*
> + * imc_events_node_parser: Parse the event node "dev" and assign the parsed
> + *                         information to event "events".
> + *
> + * Parses the "reg", "scale" and "unit" properties of this event.
> + * "reg" gives us the event offset in the counter memory.
> + */
> +static int imc_events_node_parser(struct device_node *dev,
> +				  struct imc_events *events,
> +				  struct property *event_scale,
> +				  struct property *event_unit,
> +				  struct property *name_prefix,
> +				  u32 reg, int pmu_domain)
> +{
> +	struct property *name, *pp;
> +	char *ev_name;
> +	u32 val;
> +	int idx = 0, ret;

Whenever possible you should defer initialisation of variables until you
need to. So in this case just before the for_each loop below.

> +	if (!dev)
> +		goto fail;

That can't happen as the code is currently written. Again the CPU will
catch it for you if it's NULL.

> +	/* Check for "event-name" property, which is the perfix for event names */
> +	name = of_find_property(dev, "event-name", NULL);
> +	if (!name)
> +		return -ENODEV;
> +
> +	if (!name->value ||
> +	  (strnlen(name->value, name->length) == name->length) ||
> +	  (name->length > IMC_MAX_NAME_VAL_LEN))
> +		return -EINVAL;
> +
> +	ev_name = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
> +	if (!ev_name)
> +		return -ENOMEM;
> +
> +	snprintf(ev_name, IMC_MAX_NAME_VAL_LEN, "%s%s",
> +		 (char *)name_prefix->value,
> +		 (char *)name->value);

All of the above can become:

	const char *s;

	if (of_property_read_string(dev, "event-name", &s))
		return -ENODEV;

	ev_name = kasprintf(GFP_KERNEL, "%s%s", name_prefix, s);
	if (!ev_name)
		return -ENOMEM;

Where name_prefix should be a const char *, not a struct property.

> +	/*
> +	 * Parse each property of this event node "dev". Property "reg" has
> +	 * the offset which is assigned to the event name. Other properties
> +	 * like "scale" and "unit" are assigned to event.scale and event.unit
> +	 * accordingly.
> +	 */
> +	for_each_property_of_node(dev, pp) {

I don't see why you're using for_each_property_of_node() here.

And in fact it can lead to a bug ...

> +		/*
> +		 * If there is an issue in parsing a single property of
> +		 * this event, we just clean up the buffers, but we still
> +		 * continue to parse. TODO: This could be rewritten to skip the
> +		 * entire event node incase of parsing issues, but that can be
> +		 * done later.
> +		 */
> +		if (strncmp(pp->name, "reg", 3) == 0) {
> +			of_property_read_u32(dev, pp->name, &val);
> +			val += reg;
> +			ret = imc_event_prop_val(ev_name, val, &events[idx]);
> +			if (ret) {
> +				if (events[idx].ev_name)
> +					kfree(events[idx].ev_name);
> +				if (events[idx].ev_value)
> +					kfree(events[idx].ev_value);
> +				goto fail;
> +			}
> +			idx++;
> +			/*
> +			 * If the common scale and unit properties available,
> +			 * then, assign them to this event
> +			 */
> +			if (event_scale) {
> +				ret = set_event_property(event_scale, "scale",
> +							 &events[idx],
> +							 ev_name);
> +				if (ret)
> +					goto fail;
> +				idx++;
> +			}
> +			if (event_unit) {
> +				ret = set_event_property(event_unit, "unit",
> +							 &events[idx],
> +							 ev_name);
> +				if (ret)
> +					goto fail;
> +				idx++;
> +			}
> +		} else if (strncmp(pp->name, "unit", 4) == 0) {
> +			/*
> +			 * The event's unit and scale properties can override the
> +			 * PMU's event and scale properties, if present.
> +			 */

.. because the order in which you discover the properties is not well
defined, the device tree properties may appear in any order.

So the event's unit and scale may or may not override the PMU's event
and scale, depending on what order they're discovered in.

Though even if you did find them in order, they don't override the
inherited value, they just create a new attribute with the same name.

Which means sysfs will complain about the duplicate attribute names:
  sysfs: cannot create duplicate filename '/devices/nest_mcs01/events/PM_MCS01_64B_RD_OR_WR_DISP_PORT01.scale'

Also the events array is sized (below in imc_events_setup()) as:

	nr_children = get_nr_children(dir) * 3;

Which doesn't account for inherited scale/unit, which means we overflow
the events array and corrupt memory :}

  Unable to handle kernel paging request for data at address 0x313053434d5f4d50

> +			ret = set_event_property(pp, "unit", &events[idx],
> +						 ev_name);
> +			if (ret)
> +				goto fail;
> +			idx++;
> +		} else if (strncmp(pp->name, "scale", 5) == 0) {
> +			ret = set_event_property(pp, "scale", &events[idx],
> +						 ev_name);
> +			if (ret)
> +				goto fail;
> +			idx++;
> +		}
> +	}
> +
> +	return idx;
> +fail:
> +	return -EINVAL;
> +}


I think it might work better if the order of things is reversed here.
Instead of walking the device tree to find a list of strings that are
then passed to the PMU, it would work better I think if when the PMU is
registering, it walks the device tree (or calls a helper) to get each
event, and creates an attribute from them.

Either way, it should end up looking more like this:

struct imc_event
{
	char *name;
	char *scale;
	char *unit;
	u32 value;
};

struct imc_event *imc_parse_event(struct device_node *np, const char *scale,
				  const char *unit, const char *prefix)
{
	struct imc_event *event;
	const char *s;

	event = kzalloc(sizeof(*event), GFP_KERNEL);
	if (!event)
		return NULL;

	if (of_property_read_u32(np, "reg", &event->value))
		goto error;

	if (of_property_read_string(np, "event-name", &s))
		goto error;

	event->name = kasprintf(GFP_KERNEL, "%s%s", prefix, s);
	if (!event->name)
		goto error;

	if (of_property_read_string(np, "scale", &s))
		s = scale;

	if (s) {
		event->scale = kstrdup(s, GFP_KERNEL);
		if (!event->scale)
			goto error;
	}

	if (of_property_read_string(np, "unit", &s))
		s = unit;

	if (s) {
		event->unit = kstrdup(s, GFP_KERNEL);
		if (!event->unit)
			goto error;
	}

	return event;
error:
	kfree(event->unit);
	kfree(event->scale);
	kfree(event->name);
	kfree(event);

	return NULL;
}



...
> +
> +/*
> + * imc_events_setup() : First finds the event node for the pmu and
> + *                      gets the number of supported events, then
> + * allocates memory for the same and parse the events.
> + */
> +static int imc_events_setup(struct device_node *parent,
> +					   int pmu_index,
> +					   struct imc_pmu *pmu_ptr,
> +					   u32 prop,
> +					   int *idx)
> +{
> +	struct device_node *ev_node = NULL, *dir = NULL;
> +	u32 reg;
> +	struct property *scale_pp, *unit_pp, *name_prefix;
> +	int ret = 0, nr_children = 0;
> +
> +	/*
> +	 * Fetch the actual node where the events for this PMU exist.
> +	 */
> +	dir = of_find_node_by_phandle(prop);
> +	if (!dir)
> +		return -ENODEV;

dir is a strange name for a device_node, np is most common.

> +	/*
> +	 * Get the maximum no. of events in this node.
> +	 * Multiply by 3 to account for .scale and .unit properties
> +	 * This number suggests the amount of memory needed to setup the
> +	 * events for this pmu.
> +	 */
> +	nr_children = get_nr_children(dir) * 3;

Doesn't account for inherited unit/scale as mentioned above.

Using the struct I suggested:

struct imc_event
{
	char *name;
	char *scale;
	char *unit;
	u32 value;
};

We don't have to do any guessing, we can just count the number of child
nodes and know that's exactly how many structs we'll need.

> +	pmu_ptr->events = kzalloc((sizeof(struct imc_events) * nr_children),
> +			 GFP_KERNEL);
> +	if (!pmu_ptr->events)
> +		return -ENOMEM;
> +
> +	/*
> +	 * Check if there is a common "scale" and "unit" properties inside
> +	 * the PMU node for all the events supported by this PMU.
> +	 */
> +	scale_pp = of_find_property(parent, "scale", NULL);
> +	unit_pp = of_find_property(parent, "unit", NULL);
> +
> +	/*
> +	 * Get the event-prefix property from the PMU node
> +	 * which needs to be attached with the event names.
> +	 */
> +	name_prefix = of_find_property(parent, "events-prefix", NULL);
> +	if (!name_prefix)
> +		goto free_events;
> +
> +	/*
> +	 * "reg" property gives out the base offset of the counters data
> +	 * for this PMU.
> +	 */
> +	of_property_read_u32(parent, "reg", &reg);

Error checking.

> +
> +	if (!name_prefix->value ||
> +	   (strnlen(name_prefix->value, name_prefix->length) == name_prefix->length) ||
> +	   (name_prefix->length > IMC_MAX_NAME_VAL_LEN))
> +		goto free_events;

of_property_read_string()

> +
> +	/* Loop through event nodes */
> +	for_each_child_of_node(dir, ev_node) {
> +		ret = imc_events_node_parser(ev_node, &pmu_ptr->events[*idx], scale_pp,
> +				unit_pp, name_prefix, reg, pmu_ptr->domain);
> +		if (ret < 0) {
> +			/* Unable to parse this event */
> +			if (ret == -ENOMEM)
> +				goto free_events;
> +			continue;
> +		}

If this was being called from the PMU initialisation path, at this
point would take the 

> +
> +		/*
> +		 * imc_event_node_parser will return number of
> +		 * event entries created for this. This could include
> +		 * event scale and unit files also.
> +		 */
> +		*idx += ret;
> +	}
> +	return 0;
> +
> +free_events:
> +	imc_free_events(pmu_ptr->events, *idx);
> +	return -ENODEV;
> +
> +}
> +
> +/* imc_get_mem_addr_nest: Function to get nest counter memory region for each chip */
> +static int imc_get_mem_addr_nest(struct device_node *node,
> +				 struct imc_pmu *pmu_ptr,
> +				 u32 offset)
> +{
> +	int nr_chips = 0, i, j;
> +	u64 *base_addr_arr, baddr;
> +	u32 *chipid_arr, size = pmu_ptr->counter_mem_size, pages;
> +
> +	nr_chips = of_property_count_u32_elems(node, "chip-id");
> +	if (!nr_chips)
> +		return -ENODEV;

of_property_count_u32_elems() returns -EINVAL if the the property is not
found, and some other negative error codes too.

Which means if (!-22) is false, and we go on to kzalloc(8 * -22, ...)
bytes of memory below :)

> +	base_addr_arr = kzalloc((sizeof(u64) * nr_chips), GFP_KERNEL);
> +	chipid_arr = kzalloc((sizeof(u32) * nr_chips), GFP_KERNEL);
> +	if (!base_addr_arr || !chipid_arr)
> +		return -ENOMEM;

If one is allocated but not the other then we leak memory here. You
should be able to check for each case and just goto error, where you
free everything.

> +
> +	of_property_read_u32_array(node, "chip-id", chipid_arr, nr_chips);

Need error handling here, yes it should succeed but check anyway.

> +	of_property_read_u64_array(node, "base-addr", base_addr_arr, nr_chips);

If this fails we end up with base_addr_arr full of zeroes, and so below
we point mem_info[] at (0 + offset), which is probably the kernel.

> +	pmu_ptr->mem_info = kzalloc((sizeof(struct imc_mem_info) * nr_chips), GFP_KERNEL);
> +	if (!pmu_ptr->mem_info) {
> +		if (base_addr_arr)
> +			kfree(base_addr_arr);
> +		if (chipid_arr)
> +			kfree(chipid_arr);
> +
> +		return -ENOMEM;
> +		}
> +
> +	for (i = 0; i < nr_chips; i++) {
> +		pmu_ptr->mem_info[i].id = chipid_arr[i];
> +		baddr = base_addr_arr[i] + offset;
> +		for (j = 0; j < (size/PAGE_SIZE); j++) {
> +			pages = PAGE_SIZE * j;
> +			pmu_ptr->mem_info[i].vbase[j] = phys_to_virt(baddr + pages);
> +		}
> +	}

In the success case we leak base_addr_arr and chipid_arr don't we?

> +	return 0;
> +}
> +
> +/*
> + * imc_pmu_create : Takes the parent device which is the pmu unit, pmu_index
> + *		    and domain as the inputs.
> + * Allocates memory for the pmu, sets up its domain (NEST), and
> + * calls imc_events_setup() to allocate memory for the events supported
> + * by this pmu. Assigns a name for the pmu.
> + *
> + * If everything goes fine, it calls, init_imc_pmu() to setup the pmu device
> + * and register it.
> + */
> +static int imc_pmu_create(struct device_node *parent, int pmu_index, int domain)
> +{
> +	u32 prop = 0;
> +	struct property *pp;
> +	char *buf;
> +	int idx = 0, ret = 0;
> +	struct imc_pmu *pmu_ptr;
> +	u32 offset;
> +
> +	if (!parent)
> +		return -EINVAL;

Shouldn't be required.

> +	/* memory for pmu */
> +	pmu_ptr = kzalloc(sizeof(struct imc_pmu), GFP_KERNEL);
> +	if (!pmu_ptr)
> +		return -ENOMEM;
> +
> +	pmu_ptr->domain = domain;
> +
> +	/* Needed for hotplug/migration */
> +	per_nest_pmu_arr[pmu_index] = pmu_ptr;

You don't clear it on error?

> +	pp = of_find_property(parent, "name", NULL);
> +	if (!pp) {
> +		ret = -ENODEV;
> +		goto free_pmu;
> +	}
> +
> +	if (!pp->value ||
> +	   (strnlen(pp->value, pp->length) == pp->length) ||
> +	   (pp->length > IMC_MAX_NAME_VAL_LEN)) {
> +		ret = -EINVAL;
> +		goto free_pmu;
> +	}
> +
> +	buf = kzalloc(IMC_MAX_NAME_VAL_LEN, GFP_KERNEL);
> +	if (!buf) {
> +		ret = -ENOMEM;
> +		goto free_pmu;
> +	}
> +	/* Save the name to register it later */
> +	sprintf(buf, "nest_%s", (char *)pp->value);
> +	pmu_ptr->pmu.name = (char *)buf;

of_property_read_string() / kasprintf() again.

> +	if (of_property_read_u32(parent, "size", &pmu_ptr->counter_mem_size))
> +		pmu_ptr->counter_mem_size = 0;
> +
> +	if (!of_property_read_u32(parent, "offset", &offset)) {
> +		if (imc_get_mem_addr_nest(parent, pmu_ptr, offset))

You don't set ret here, which means we return 0 incorrectly.

> +			goto free_pmu;
> +		pmu_ptr->imc_counter_mmaped = 1;
> +	}
> +
> +	/*
> +	 * "events" property inside a PMU node contains the phandle value
> +	 * for the actual events node. The "events" node for the IMC PMU
> +	 * is not in this node, rather inside "imc-counters" node, since,
> +	 * we want to factor out the common events (thereby, reducing the
> +	 * size of the device tree)
> +	 */
> +	if (!of_property_read_u32(parent, "events", &prop)) {
> +		if (prop)
> +			imc_events_setup(parent, pmu_index, pmu_ptr, prop, &idx);

You don't need to check if (prop) here, of_find_node_by_phandle() will
handle it for you.

> +	}
> +	return 0;
> +
> +free_pmu:
> +	if (pmu_ptr)
> +		kfree(pmu_ptr);

        kfree(pmu_ptr);

> +	return ret;
> +}
> +
>  static int opal_imc_counters_probe(struct platform_device *pdev)
>  {
>  	struct device_node *imc_dev = NULL;
> +	int pmu_count = 0, domain;
> +	u32 type;
>  
>  	if (!pdev || !pdev->dev.of_node)
>  		return -ENODEV;
> @@ -50,7 +478,16 @@ static int opal_imc_counters_probe(struct platform_device *pdev)
>  	imc_dev = pdev->dev.of_node;
>  	if (!imc_dev)
>  		return -ENODEV;
> -
> +	for_each_compatible_node(imc_dev, NULL, IMC_DTB_UNIT_COMPAT) {
> +		if (of_property_read_u32(imc_dev, "type", &type))
> +			continue;

That should at least get a pr_debug(), you found a compatible node but it
was missing a required property.

> +		if (type == IMC_COUNTER_PER_CHIP)
> +			domain = IMC_DOMAIN_NEST;

Why do we have type and domain? Why not just use type?

> +		else
> +			continue;
> +		if (!imc_pmu_create(imc_dev, pmu_count, domain))
> +			pmu_count++;

Where do we use pmu_count ?

> +	}
>  	return 0;
>  }


cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v12 02/10] powerpc/powernv: Autoload IMC device driver module
  2017-07-03  9:37 ` [PATCH v12 02/10] powerpc/powernv: Autoload IMC device driver module Anju T Sudhakar
@ 2017-07-07  6:53   ` Michael Ellerman
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Ellerman @ 2017-07-07  6:53 UTC (permalink / raw)
  To: Anju T Sudhakar
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Hi Maddy/Anju,

Comments below :)

Anju T Sudhakar <anju@linux.vnet.ibm.com> writes:
> Code to create platform device for the IMC counters.
> Paltform devices are created based on the IMC compatibility
> string.
>
> New Config flag "CONFIG_HV_PERF_IMC_CTRS" add to contain the
> IMC counter changes.

I don't think we need a separate config, it can just use
CONFIG_PPC_POWERNV.

I don't think we'll ever want to turn it off for powernv, unless we're
trying to build a small kernel, in which case we'll turn of PERF
entirely.

> diff --git a/arch/powerpc/platforms/powernv/opal-imc.c b/arch/powerpc/platforms/powernv/opal-imc.c
> new file mode 100644
> index 000000000000..5b1045c81af4
> --- /dev/null
> +++ b/arch/powerpc/platforms/powernv/opal-imc.c
> @@ -0,0 +1,73 @@
> +/*
> + * OPAL IMC interface detection driver
> + * Supported on POWERNV platform
> + *
> + * Copyright	(C) 2017 Madhavan Srinivasan, IBM Corporation.
> + *		(C) 2017 Anju T Sudhakar, IBM Corporation.
> + *		(C) 2017 Hemant K Shaw, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or later version.
> + *
> + * This program is distributed in the hope that it will be useful,
> + * but WITHOUT ANY WARRANTY; without even the implied warranty of
> + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> + * GNU General Public License for more details.

We usually don't include that part in every file.

> + */
> +#include <linux/kernel.h>
> +#include <linux/module.h>
> +#include <linux/platform_device.h>
> +#include <linux/miscdevice.h>
> +#include <linux/fs.h>
> +#include <linux/of.h>
> +#include <linux/of_address.h>
> +#include <linux/of_platform.h>
> +#include <linux/poll.h>
> +#include <linux/mm.h>
> +#include <linux/slab.h>
> +#include <linux/crash_dump.h>
> +#include <asm/opal.h>
> +#include <asm/io.h>
> +#include <asm/uaccess.h>
> +#include <asm/cputable.h>
> +#include <asm/imc-pmu.h>
> +
> +static int opal_imc_counters_probe(struct platform_device *pdev)
> +{
> +	struct device_node *imc_dev = NULL;
> +
> +	if (!pdev || !pdev->dev.of_node)
> +		return -ENODEV;

We don't need that level of paranoia :)

> +	/*
> +	 * Check whether this is kdump kernel. If yes, just return.
> +	 */
> +	if (is_kdump_kernel())
> +		return -ENODEV;

Hmm, that's a bit unusual. Is there any particular reason to do that for
this driver?

> +	imc_dev = pdev->dev.of_node;
> +	if (!imc_dev)
> +		return -ENODEV;
> +
> +	return 0;
> +}
> +
> +static const struct of_device_id opal_imc_match[] = {
> +	{ .compatible = IMC_DTB_COMPAT },
> +	{},
> +};
> +
> +static struct platform_driver opal_imc_driver = {
> +	.driver = {
> +		.name = "opal-imc-counters",
> +		.of_match_table = opal_imc_match,
> +	},
> +	.probe = opal_imc_counters_probe,
> +};
> +

This can't be built as a module, so it should not be using MODULE macros.

> +MODULE_DEVICE_TABLE(of, opal_imc_match);

Drop that.

> +module_platform_driver(opal_imc_driver);

Use builtin_platform_driver().

> +MODULE_DESCRIPTION("PowerNV OPAL IMC driver");
> +MODULE_LICENSE("GPL");

Drop those.

> diff --git a/arch/powerpc/platforms/powernv/opal.c b/arch/powerpc/platforms/powernv/opal.c
> index 59684b4af4d1..fbdca259ea76 100644
> --- a/arch/powerpc/platforms/powernv/opal.c
> +++ b/arch/powerpc/platforms/powernv/opal.c
> @@ -705,6 +707,17 @@ static void opal_pdev_init(const char *compatible)
>  		of_platform_device_create(np, NULL, NULL);
>  }
>  
> +#ifdef CONFIG_HV_PERF_IMC_CTRS
> +static void __init opal_imc_init_dev(void)
> +{
> +	struct device_node *np;
> +
> +	np = of_find_compatible_node(NULL, NULL, IMC_DTB_COMPAT);
> +	if (np)
> +		of_platform_device_create(np, NULL, NULL);
> +}
> +#endif

That doesn't need the #ifdef.

>  static int kopald(void *unused)
>  {
>  	unsigned long timeout = msecs_to_jiffies(opal_heartbeat) + 1;
> @@ -778,6 +791,11 @@ static int __init opal_init(void)
>  	/* Setup a heatbeat thread if requested by OPAL */
>  	opal_init_heartbeat();
>  
> +#ifdef CONFIG_HV_PERF_IMC_CTRS
> +	/* Detect IMC pmu counters support and create PMUs */
> +	opal_imc_init_dev();
> +#endif
> +

Neither here.

>  	/* Create leds platform devices */
>  	leds = of_find_node_by_path("/ibm,opal/leds");
>  	if (leds) {
> -- 
> 2.11.0


cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: [PATCH v12 01/10] powerpc/powernv: Data structure and macros definitions for IMC
  2017-07-03  9:37 ` [PATCH v12 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
@ 2017-07-07  9:26   ` Michael Ellerman
  0 siblings, 0 replies; 11+ messages in thread
From: Michael Ellerman @ 2017-07-07  9:26 UTC (permalink / raw)
  To: Anju T Sudhakar
  Cc: linux-kernel, linuxppc-dev, ego, bsingharora, anton, sukadev,
	mikey, stewart, dja, eranian, hemant, maddy, anju

Hi Maddy/Anju,

Anju T Sudhakar <anju@linux.vnet.ibm.com> writes:
> From: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
>
> Create a new header file to add the data structures and
> macros needed for In-Memory Collection (IMC) counter support.
>
> Signed-off-by: Anju T Sudhakar <anju@linux.vnet.ibm.com>
> Signed-off-by: Hemant Kumar <hemant@linux.vnet.ibm.com>
> Signed-off-by: Madhavan Srinivasan <maddy@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/imc-pmu.h | 99 ++++++++++++++++++++++++++++++++++++++
>  1 file changed, 99 insertions(+)
>  create mode 100644 arch/powerpc/include/asm/imc-pmu.h
>
> diff --git a/arch/powerpc/include/asm/imc-pmu.h b/arch/powerpc/include/asm/imc-pmu.h
> new file mode 100644
> index 000000000000..ffaea0b9c13e
> --- /dev/null
> +++ b/arch/powerpc/include/asm/imc-pmu.h
> @@ -0,0 +1,99 @@
> +#ifndef PPC_POWERNV_IMC_PMU_DEF_H
> +#define PPC_POWERNV_IMC_PMU_DEF_H
> +
> +/*
> + * IMC Nest Performance Monitor counter support.
> + *
> + * Copyright (C) 2017 Madhavan Srinivasan, IBM Corporation.
> + *           (C) 2017 Anju T Sudhakar, IBM Corporation.
> + *           (C) 2017 Hemant K Shaw, IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or later version.
> + */
> +
> +#include <linux/perf_event.h>
> +#include <linux/slab.h>
> +#include <linux/of.h>
> +#include <linux/io.h>
> +#include <asm/opal.h>
> +
> +/*
> + * For static allocation of some of the structures.
> + */
> +#define IMC_MAX_PMUS			32
> +
> +/*
> + * This macro is used for memory buffer allocation of
> + * event names and event string
> + */
> +#define IMC_MAX_NAME_VAL_LEN		96
> +
> +/*
> + * Currently Microcode supports a max of 256KB of counter memory
> + * in the reserved memory region. Max pages to mmap (considering 4K PAGESIZE).
> + */
> +#define IMC_MAX_PAGES			64

Ideally that sort of detail comes from the device tree. Otherwise old
kernels will be unable to run on new hardware which supports more memory.

Actually looking at where we use it, it seems like we don't it to come
from the device tree.

Seems core IMC only ever uses one page.

Thread IMC gets the size indirectly via the device tree:

	if (of_property_read_u32(parent, "size", &pmu_ptr->counter_mem_size))

So we should be able to dynamically size vbase.

> +/*
> + *Compatbility macros for IMC devices
> + */
> +#define IMC_DTB_COMPAT			"ibm,opal-in-memory-counters"
> +#define IMC_DTB_UNIT_COMPAT		"ibm,imc-counters"
> +
> +/*
> + * Structure to hold memory address information for imc units.
> + */
> +struct imc_mem_info {
> +	u32 id;
> +	u64 *vbase[IMC_MAX_PAGES];
> +};

cheers

^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2017-07-07  9:27 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-07-03  9:37 [PATCH v12 00/10] IMC Instrumentation Support Anju T Sudhakar
2017-07-03  9:37 ` [PATCH v12 01/10] powerpc/powernv: Data structure and macros definitions for IMC Anju T Sudhakar
2017-07-07  9:26   ` Michael Ellerman
2017-07-03  9:37 ` [PATCH v12 02/10] powerpc/powernv: Autoload IMC device driver module Anju T Sudhakar
2017-07-07  6:53   ` Michael Ellerman
2017-07-03  9:37 ` [PATCH v12 03/10] powerpc/powernv: Detect supported IMC units and its events Anju T Sudhakar
2017-07-06 13:48   ` Michael Ellerman
2017-07-03  9:37 ` [PATCH v12 04/10] powerpc/perf: Add generic IMC pmu group and event functions Anju T Sudhakar
2017-07-03  9:37 ` [PATCH v12 06/10] powerpc/powernv: Core IMC events detection Anju T Sudhakar
2017-07-03  9:37 ` [PATCH v12 08/10] powerpc/powernv: Thread " Anju T Sudhakar
2017-07-03  9:37 ` [PATCH v12 09/10] powerpc/perf: Thread IMC PMU functions Anju T Sudhakar

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).