linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
@ 2014-01-16 23:53 Cody P Schafer
  2014-01-16 23:53 ` [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus Cody P Schafer
                   ` (8 more replies)
  0 siblings, 9 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
performance counters: gpci ("get performance counter info") and 24x7.

The counters supplied by these interfaces are continually counting and never
need to be (and cannot be) disabled or enabled. They additionally do not
generate any interrupts. This makes them in some regards similar to software
counters, and as a result their implimentation shares some common code (which
an initial patch exposes) with the sw counters.

There is ongoing work to support transactions for each of these pmus.

Cody P Schafer (8):
  perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
  perf core: export swevent hrtimer helpers
  powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
  powerpc: add hv_gpci interface header
  powerpc: add 24x7 interface header
  powerpc/perf: add support for the hv gpci (get performance counter
    info) interface
  powerpc/perf: add support for the hv 24x7 interface
  powerpc/perf: add kconfig option for hypervisor provided counters

 arch/powerpc/include/asm/hv_24x7.h     | 239 ++++++++++++++++
 arch/powerpc/include/asm/hv_gpci.h     | 490 +++++++++++++++++++++++++++++++++
 arch/powerpc/include/asm/hvcall.h      |   6 +-
 arch/powerpc/perf/Makefile             |   2 +
 arch/powerpc/perf/hv-24x7.c            | 354 ++++++++++++++++++++++++
 arch/powerpc/perf/hv-gpci.c            | 235 ++++++++++++++++
 arch/powerpc/platforms/Kconfig.cputype |   6 +
 include/linux/perf_event.h             |  22 +-
 kernel/events/core.c                   |   8 +-
 9 files changed, 1356 insertions(+), 6 deletions(-)
 create mode 100644 arch/powerpc/include/asm/hv_24x7.h
 create mode 100644 arch/powerpc/include/asm/hv_gpci.h
 create mode 100644 arch/powerpc/perf/hv-24x7.c
 create mode 100644 arch/powerpc/perf/hv-gpci.c

-- 
1.8.5.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
@ 2014-01-16 23:53 ` Cody P Schafer
  2014-02-01  5:58   ` Michael Ellerman
  2014-01-16 23:53 ` [PATCH 2/8] perf core: export swevent hrtimer helpers Cody P Schafer
                   ` (7 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
generate functions to extract the relevent bits from
event->attr.config{,1,2} for use by sw-like pmus where the
'config{,1,2}' values don't map directly to hardware registers.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 include/linux/perf_event.h | 17 +++++++++++++++++
 1 file changed, 17 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 2e069d1..8646e33 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -870,4 +870,21 @@ _name##_show(struct device *dev,					\
 									\
 static struct device_attribute format_attr_##_name = __ATTR_RO(_name)
 
+#define PMU_RANGE_ATTR(name, attr_var, bit_start, bit_end)		\
+PMU_FORMAT_ATTR(name, #attr_var ":" #bit_start "-" #bit_end);		\
+PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)
+
+#define PMU_RANGE_RESV(name, attr_var, bit_start, bit_end)		\
+static u64 event_get_##name##_max(void)					\
+{									\
+	int bits = (bit_end) - (bit_start) + 1;				\
+	return ((0x1ULL << (bits - 1ULL)) - 1ULL) |			\
+		(0xFULL << (bits - 4ULL));				\
+}									\
+static u64 event_get_##name(struct perf_event *event)			\
+{									\
+	return (event->attr.attr_var >> (bit_start)) &			\
+		event_get_##name##_max();				\
+}
+
 #endif /* _LINUX_PERF_EVENT_H */
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 2/8] perf core: export swevent hrtimer helpers
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
  2014-01-16 23:53 ` [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus Cody P Schafer
@ 2014-01-16 23:53 ` Cody P Schafer
  2014-02-01  5:58   ` Michael Ellerman
  2014-01-16 23:53 ` [PATCH 3/8] powerpc: add hvcalls for 24x7 and gpci (get performance counter info) Cody P Schafer
                   ` (6 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

Export the swevent hrtimer helpers currently only used in events/core.c
to allow the addition of architecture specific sw-like pmus.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 include/linux/perf_event.h | 5 ++++-
 kernel/events/core.c       | 8 ++++----
 2 files changed, 8 insertions(+), 5 deletions(-)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index 8646e33..c5bc71a 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -558,7 +558,10 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
 				int src_cpu, int dst_cpu);
 extern u64 perf_event_read_value(struct perf_event *event,
 				 u64 *enabled, u64 *running);
-
+extern void perf_swevent_init_hrtimer(struct perf_event *event);
+extern void perf_swevent_start_hrtimer(struct perf_event *event);
+extern void perf_swevent_cancel_hrtimer(struct perf_event *event);
+extern int perf_swevent_event_idx(struct perf_event *event);
 
 struct perf_sample_data {
 	u64				type;
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f574401..d881d1e 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -5801,7 +5801,7 @@ static int perf_swevent_init(struct perf_event *event)
 	return 0;
 }
 
-static int perf_swevent_event_idx(struct perf_event *event)
+int perf_swevent_event_idx(struct perf_event *event)
 {
 	return 0;
 }
@@ -6030,7 +6030,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
 	return ret;
 }
 
-static void perf_swevent_start_hrtimer(struct perf_event *event)
+void perf_swevent_start_hrtimer(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 	s64 period;
@@ -6052,7 +6052,7 @@ static void perf_swevent_start_hrtimer(struct perf_event *event)
 				HRTIMER_MODE_REL_PINNED, 0);
 }
 
-static void perf_swevent_cancel_hrtimer(struct perf_event *event)
+void perf_swevent_cancel_hrtimer(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
@@ -6064,7 +6064,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
 	}
 }
 
-static void perf_swevent_init_hrtimer(struct perf_event *event)
+void perf_swevent_init_hrtimer(struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
 
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 3/8] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
  2014-01-16 23:53 ` [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus Cody P Schafer
  2014-01-16 23:53 ` [PATCH 2/8] perf core: export swevent hrtimer helpers Cody P Schafer
@ 2014-01-16 23:53 ` Cody P Schafer
  2014-02-01  5:58   ` Michael Ellerman
  2014-01-16 23:53 ` [PATCH 4/8] powerpc: add hv_gpci interface header Cody P Schafer
                   ` (5 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hvcall.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
index d8b600b..48d6efa 100644
--- a/arch/powerpc/include/asm/hvcall.h
+++ b/arch/powerpc/include/asm/hvcall.h
@@ -269,11 +269,15 @@
 #define H_COP			0x304
 #define H_GET_MPP_X		0x314
 #define H_SET_MODE		0x31C
-#define MAX_HCALL_OPCODE	H_SET_MODE
+#define H_GET_24X7_CATALOG_PAGE 0xF078
+#define H_GET_24X7_DATA		0xF07C
+#define H_GET_PERF_COUNTER_INFO 0xF080
+#define MAX_HCALL_OPCODE	H_GET_PERF_COUNTER_INFO
 
 /* Platform specific hcalls, used by KVM */
 #define H_RTAS			0xf000
 
+
 #ifndef __ASSEMBLY__
 
 /**
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 4/8] powerpc: add hv_gpci interface header
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
                   ` (2 preceding siblings ...)
  2014-01-16 23:53 ` [PATCH 3/8] powerpc: add hvcalls for 24x7 and gpci (get performance counter info) Cody P Schafer
@ 2014-01-16 23:53 ` Cody P Schafer
  2014-02-01  5:58   ` Michael Ellerman
  2014-01-16 23:53 ` [PATCH 5/8] powerpc: add 24x7 " Cody P Schafer
                   ` (4 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

"H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
here on) is an interface to retrieve specific performance counters and
other data from the hypervisor. All outputs have a fixed format (and
are represented as structs in this patch).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hv_gpci.h | 490 +++++++++++++++++++++++++++++++++++++
 1 file changed, 490 insertions(+)
 create mode 100644 arch/powerpc/include/asm/hv_gpci.h

diff --git a/arch/powerpc/include/asm/hv_gpci.h b/arch/powerpc/include/asm/hv_gpci.h
new file mode 100644
index 0000000..237de26
--- /dev/null
+++ b/arch/powerpc/include/asm/hv_gpci.h
@@ -0,0 +1,490 @@
+#ifndef LINUX_POWERPC_UAPI_HV_GPCI_H_
+#define LINUX_POWERPC_UAPI_HV_GPCI_H_
+
+#include <linux/types.h>
+
+/* From the document "H_GetPerformanceCounterInfo Interface" v1.06, paritialy
+ * updated with v1.07 */
+
+/* H_GET_PERF_COUNTER_INFO argument */
+struct hv_get_perf_counter_info_params {
+	__be32 counter_request; /* I */
+	__be32 starting_index;  /* IO */
+	__be16 secondary_index; /* IO */
+	__be16 returned_values; /* O */
+	__be32 detail_rc; /* O, "only for 32bit clients" */
+
+	/*
+	 * O, size each of counter_value element in bytes, only set for version
+	 * >= 0x3
+	 */
+	__be16 cv_element_size;
+
+	/* I, funny if version < 0x3 */
+	__u8 counter_info_version_in;
+
+	/* O, funny if version < 0x3 */
+	__u8 counter_info_version_out;
+	__u8 reserved[0xC];
+	__u8 counter_value[];
+} __packed;
+
+/* 8 => power8 (1.07)
+ * 6 => TLBIE  (1.07)
+ * 5 => (1.05)
+ * 4 => ?
+ * 3 => ?
+ * 2 => v7r7m0.phyp (?)
+ * 1 => v7r6m0.phyp (?)
+ * 0 => v7r{2,3,4}m0.phyp (?)
+ */
+#define COUNTER_INFO_VERSION_CURRENT 0x8
+
+/* these determine the counter_value[] layout and the meaning of starting_index
+ * and secondary_index */
+enum counter_info_requests {
+
+	/* GENERAL */
+
+	/* @starting_index: "starting" physical processor index or -1 for
+	 *                  current phyical processor. Data is only collected
+	 *                  for the processors' "primary" thread.
+	 * @secondary_index: unused
+	 */
+	CIR_dispatch_timebase_by_processor = 0x10,
+
+	/* @starting_index: starting partition id or -1 for the current logical
+	 *                  partition (virtual machine).
+	 * @secondary_index: unused
+	 */
+	CIR_entitled_capped_uncapped_donated_idle_timebase_by_partition = 0x20,
+
+	/* @starting_index: starting partition id or -1 for the current logical
+	 *                  partition (virtual machine).
+	 * @secondary_index: unused
+	 */
+	CIR_run_instructions_run_cycles_by_partition = 0x30,
+
+	/* @starting_index: must be -1 (to refer to the current partition)
+	 * @secondary_index: unused
+	 */
+	CIR_system_performance_capabilities = 0x40,
+
+
+	/* Data from this should only be considered valid if
+	 * counter_info_version >= 0x3
+	 * @starting_index: starting hardware chip id or -1 for the current hw
+	 *		    chip id
+	 * @secondary_index: unused
+	 */
+	CIR_processor_bus_utilization_abc_links = 0x50,
+
+	/* Data from this should only be considered valid if
+	 * counter_info_version >= 0x3
+	 * @starting_index: starting hardware chip id or -1 for the current hw
+	 *		    chip id
+	 * @secondary_index: unused
+	 */
+	CIR_processor_bus_utilization_wxyz_links = 0x60,
+
+
+	/* EXPANDED */
+
+	/* Avaliable if counter_info_version >= 0x3
+	 * @starting_index: starting hardware chip id or -1 for the current hw
+	 *		    chip id
+	 * @secondary_index: unused
+	 */
+	CIR_processor_bus_utilization_gx_links = 0x70,
+
+	/* Avaliable if counter_info_version >= 0x3
+	 * @starting_index: starting hardware chip id or -1 for the current hw
+	 *		    chip id
+	 * @secondary_index: unused
+	 */
+	CIR_processor_bus_utilization_mc_links = 0x80,
+
+	/* Avaliable if counter_info_version >= 0x3
+	 * @starting_index: starting physical processor or -1 for the current
+	 *                  physical processor
+	 * @secondary_index: unused
+	 */
+	CIR_processor_config = 0x90,
+
+	/* Avaliable if counter_info_version >= 0x3
+	 * @starting_index: starting physical processor or -1 for the current
+	 *                  physical processor
+	 * @secondary_index: unused
+	 */
+	CIR_current_processor_frequency = 0x91,
+
+	CIR_processor_core_utilization = 0x94,
+
+	CIR_processor_core_power_mode = 0x95,
+
+	CIR_affinity_domain_information_by_virutal_processor = 0xA0,
+
+	CIR_affinity_domain_info_by_domain = 0xB0,
+
+	CIR_affinity_domain_info_by_partition = 0xB1,
+
+	/* @starting_index: unused
+	 * @secondary_index: unused
+	 */
+	CIR_physical_memory_info = 0xC0,
+
+	CIR_processor_bus_topology = 0xD0,
+
+	CIR_partition_hypervisor_queuing_times = 0xE0,
+
+	CIR_system_hypervisor_times = 0xF0,
+
+	/* LAB */
+
+	CIR_set_mmcrh = 0x80001000,
+	CIR_get_hpmcx = 0x80002000,
+};
+
+/* counter value layout */
+struct cv_dispatch_timebase_by_processor {
+	__be64 processor_time_in_timebase_cycles;
+	__be32 hw_processor_id;
+	__be16 owning_part_id; /* 0xffff if shared or unowned */
+	__u8 processor_state;
+	__u8 version; /* unused unless counter_info_version == 0 */
+	__be32 hw_chip_id; /* -1 for "Not Installed" processors */
+	__be32 phys_module_id; /* -1 for "Not Installed" processors */
+	__be32 primary_affinity_domain_idx;
+	__be32 secondary_affinity_domain_idx;
+	__be32 processor_version;
+	__be16 logical_processor_idx;
+	__u8 reserved[0x2];
+
+	/* counter_info_version >= 0x3 || version >= 0x1 */
+	__be32 processor_id_register;
+	__be32 physical_processor_idx; /* counter_info_version >= 0x3 */
+} __packed;
+
+struct cv_timebase_by_partition {
+	__be64 partition_id;
+	__be64 entitled_cycles;
+	__be64 consumed_capped_cycles;
+	__be64 consumed_uncapped_cycles;
+	__be64 cycles_donated;
+	__be64 purr_idle_cycles;
+} __packed;
+
+struct cv_cycles_per_partition {
+	__be64 partition_id;
+	__be64 instructions_completed; /* 0 if collection is unsupported */
+	__be64 cycles; /* 0 if collection is unsupported */
+} __packed;
+
+struct cv_system_performance_capabilities {
+	/* If != 0, allowed to collect data from other partitions */
+	__u8 perf_collect_privlidged;
+
+	/* These are only valid if counter_info_version >= 0x3 */
+#define CV_CM_GA       0x1
+#define CV_CM_EXPANDED 0x2
+#define CV_CM_LAB      0x3
+	/* remainig bits are reserved */
+	__u8 capability_mask;
+	__u8 reserved[0xE];
+} __packed;
+
+struct cv_processor_bus_utilization_abc {
+	__be32 hw_chip_id;
+	__u8 reserved1[0xC];
+	__be64 total_link_cycles;
+	__be64 idle_cycles_a;
+	__be64 idle_cycles_b;
+	__be64 idle_cycles_c;
+	__u8 reserved2[0x20];
+} __packed;
+
+struct cv_processor_bus_utilization_wxyz {
+	__be32 hw_chip_id;
+	__u8 reserved1[0xC];
+	__be64 total_link_cycles;
+
+	/* Inactive links (all cycles idle) give -1 */
+	__be64 idle_cycles_w;
+	__be64 idle_cycles_x;
+	__be64 idle_cycles_y;
+	__be64 idle_cycles_z;
+	__u8 reserved2[0x28];
+} __packed;
+
+/* EXPANDED */
+
+struct cv_gx_cycles {
+	__be64 address_cycles;
+	__be64 data_cycles;
+	__be64 retries;
+	__be64 bus_cycles;
+	__be64 total_cycles;
+} __packed;
+
+struct cv_gx_cycles_io {
+	struct cv_gx_cycles in, out;
+} __packed;
+
+struct cv_processor_bus_utilization_gx {
+	__be32 hw_chip_id;
+	__u8 reserved1[0xC];
+	struct cv_gx_cycles_io gx[2];
+} __packed;
+
+struct cv_mc_counts {
+	__be64 frames;
+	__be64 reads;
+	__be64 writes;
+	__be64 total_cycles;
+} __packed;
+
+/* inactive links return 0 for all utilization data */
+struct cv_processor_bus_utilization_mc {
+	__be32 hw_chip_id;
+	__u8 reserved1[0xC];
+	struct cv_mc_counts mc[2];
+} __packed;
+
+struct cv_processor_config {
+	__be32 physical_processor_idx;
+	__be32 hw_node_id;
+	__be32 hw_card_id;
+	__be32 phys_module_id;
+	__be32 hw_chip_id;
+	__be32 hw_processor_id;
+	__be32 processor_id_register;
+
+#define CV_PS_NOT_INSTALLED 0x1
+#define CV_PS_GAURDED_OFF   0x2
+#define CV_PS_UNLICENCED    0x3
+#define CV_PS_SHARED        0x4
+#define CV_PS_BORROWED      0x5
+#define CV_PS_DEDICATED     0x6
+	__u8 processor_state;
+
+	__u8 reserved1[0x1];
+	__be16 owning_part_id;
+	__be32 processor_version;
+	__u8 reserved2[0x4];
+} __packed;
+
+struct cv_processor_frequency {
+	__be32 physical_processor_idx;
+	__be32 hw_processor_id;
+	__u8 reserved1[0x8];
+	__be32 nominal_freq_mhz;
+	__be32 current_freq_mhz;
+} __packed;
+
+struct cv_processor_core_utilization {
+	__be32 physical_processor_idx;
+	__be32 hw_processor_id;
+	__be64 cycles;
+	__be64 timebase_at_collection;
+	__be64 purr_cycles;
+	__be64 sum_of_cycles_across_threads;
+	__be64 instructions_completed;
+} __packed;
+
+struct cv_processor_core_power_mode {
+	__be16 partition_id;
+	__u8 reserved1[0x6];
+
+#define CV_PM_NONE		 0x0
+#define CV_PM_NOMINAL		 0x1
+#define CV_PM_DYNAMIC_MAX_PERF   0x2
+#define CV_PM_DYNAMIC_POWER_SAVE 0x3
+#define CV_PM_UNKNOWN		 0xF
+	__be16 power_mode;
+
+	__u8 reserved2[0x6];
+} __packed;
+
+struct cv_affinity_domain_information_by_virutal_processor {
+	__be16 partition_id;
+	__be16 virtual_processor_idx;
+	__u8 reserved1[0xC];
+	__be16 physical_processor_idx;
+	__be16 primary_affinity_domain_idx;
+	__be16 secondary_affinity_domain_idx;
+	__u8 reserved2[0x2];
+	__u8 reserved3[0x8];
+} __packed;
+
+struct cv_affinity_domain_info_by_domain {
+	__be16 primary_affinity_domain_idx;
+	__be16 secondary_affinity_domain_idx;
+	__be32 total_processor_units;
+	__be32 free_dedicated_processor_units;
+	__be32 free_shared_processor_units;
+	__be32 total_memory_lmbs;
+	__be32 free_memory_lmbs;
+	__be32 num_partitions_in_domain;
+	__u8 reserved1[0x14];
+} __packed;
+
+struct cv_affinity_domain_info_by_partition {
+	__be16 partition_id;
+	__u8 reserved1[0x6];
+	__be16 assignment_order;
+
+#define CV_PPS_UNKNOWN			      0x00
+#define CV_PPS_CONTAIN_IN_PRIMARY_DOMAIN      0x01
+#define CV_PPS_CONTAIN_IN_SECONDARY_DOMAIN    0x02
+#define CV_PPS_SPREAD_ACROSS_SECONDAY_DOMAINS 0x03
+#define CV_PPS_WHEREEVER		      0x04
+#define CV_PPS_SCRAMBLE			      0x05
+	__u8 partition_placement_spread;
+
+	__u8 parition_affinity_score;
+	__be16 num_affinity_domain_elements;
+	__be16 affinity_domain_element_size;
+	__u8 domain_elements[];
+} __packed;
+
+struct cv_affinity_domain_elem {
+	__be16 primary_affinity_domain_idx;
+	__be16 secondary_affinity_domain_idx;
+	__be32 dedicated_processor_units_allocated;
+	__be32 dedicated_memory_allocated_reserved_1;
+	__be32 dedicated_memory_allocated_reserved_2;
+	__be32 dedicated_memory_allocated_16Gb_pages;
+	__u8 reserved[0x8];
+} __packed;
+
+/* Also avaliable via `of_get_flat_dt_prop(node, "ibm,lmb-size", &l)` */
+struct cv_physical_memory_info {
+	__be64 lmb_size_in_bytes;
+	__u8 reserved1[0x18];
+} __packed;
+
+struct cv_processor_bus_topology {
+	__be32 hw_chip_id;
+	__be32 hw_node_id;
+	__be32 fabric_chip_id;
+	__u8 reserved1[0x4];
+
+#define CV_IM_A_LINK_ACTIVE (1 << 0)
+#define CV_IM_B_LINK_ACTIVE (1 << 1)
+#define CV_IM_C_LINK_ACTIVE (1 << 2)
+/* Bits 3-5 are reserved */
+#define CV_IM_ABC_LINK_WIDTH_MASK ((1 << 6) | (1 << 7))
+#define CV_IM_ABC_LINK_WIDTH_SHIFT 6
+#define CV_IM_ABC_LINK_WIDTH_8B 0x0
+#define CV_IM_ABC_LINK_WIDTH_4B 0x1
+
+#define CV_IM_W_LINK_ACTIVE (1 << 8)
+#define CV_IM_X_LINK_ACTIVE (1 << 9)
+#define CV_IM_Y_LINK_ACTIVE (1 << 10)
+#define CV_IM_Z_LINK_ACTIVE (1 << 11)
+/* Bits 12-13 are reserved */
+
+#define CV_IM_WXYZ_LINK_WIDTH_MASK ((1 << 14) | (1 << 15))
+#define CV_IM_WXYZ_LINK_WIDTH_SHIFT 14
+#define CV_IM_WXYZ_LINK_WIDTH_8B 0x0
+#define CV_IM_WXYZ_LINK_WIDTH_4B 0x1
+
+#define CV_IM_GX0_CONFIGURED (1 << 16)
+#define CV_IM_GX1_CONFIGURED (1 << 17)
+/* Bits 18-23 are reserved */
+#define CV_IM_MC0_CONFIGURED (1 << 24)
+#define CV_IM_MC1_CONFIGURED (1 << 25)
+/* Bits 26-31 are reserved */
+
+	__be32 info_mask;
+
+	__u8 hw_node_id_connected_to_a_link;
+	__u8 hw_node_id_connected_to_b_link;
+
+	__u8 reserved2[0x2];
+
+	__u8 fabric_chip_id_connected_to_w_link;
+	__u8 fabric_chip_id_connected_to_x_link;
+	__u8 fabric_chip_id_connected_to_y_link;
+	__u8 fabric_chip_id_connected_to_z_link;
+
+	__u8 reserved3[0x4];
+} __packed;
+
+struct cv_partition_hypervisor_queuing_times {
+	__be16 partition_id;
+	__u8 reserved1[0x6];
+	__be64 time_waiting_for_entitlement; /* in timebase cycles */
+	__be64 times_waited_for_entitlement;
+	__be64 time_waiting_for_physical_processor; /* in timebase cycles */
+	__be64 times_waited_for_physical_processor;
+	__be64 dispatches_on_home_processor_core;
+	__be64 dispatches_on_home_primary_affinity_domain;
+	__be64 dispatches_on_home_secondary_affinity_domain;
+	__be64 dispatches_off_home_secondary_affinity_domain;
+	__be64 dispatches_on_dedicated_processor_donating_cycles;
+} __packed;
+
+struct cv_system_hypervisor_times {
+	__be64 phyp_time_spent_to_dispatch_virtual_processors;
+	__be64 phyp_time_spent_processing_virtual_processor_timers;
+	__be64 phyp_time_spent_managing_partitions_over_entitlement;
+	__be64 time_spent_on_system_managment;
+} __packed;
+
+/* LAB */
+
+struct cv_set_mmcrh {
+	/* Only HPMC bits (40:46, 48:54) used, all others ignored
+	 * -1 = default (0x00000000_003C1200)
+	 */
+	__be64 mmcrh_value_to_set;
+};
+
+struct cv_get_hpmcx {
+	__be32 hw_processor_id;
+	__u8 reserved1[0x4];
+	__be64 mmcrh_current;
+	__be64 time_since_mmcrh_was_set;
+	__be64 hpmc1_since_current_mmcrh;
+	__be64 hpmc2_since_current_mmcrh;
+	__be64 hpmc3_since_current_mmcrh;
+	__be64 hpmc3_current;
+	__be64 hpmc4_since_current_mmcrh;
+	__be64 hpmc4_current;
+};
+
+union h_gpci_cvs {
+	/* GA */
+	struct cv_dispatch_timebase_by_processor dispatch_timebase_by_processor;
+	struct cv_timebase_by_partition timebase_by_partition;
+	struct cv_cycles_per_partition cycles_per_partition;
+	struct cv_system_performance_capabilities system_performance_capabilities;
+	struct cv_processor_bus_utilization_abc processor_bus_utilization_abc;
+	struct cv_processor_bus_utilization_wxyz processor_bus_utilization_wxyz;
+
+	/* EXPANDED */
+	struct cv_gx_cycles gx_cycles;
+	struct cv_gx_cycles_io gx_cycles_io;
+	struct cv_processor_bus_utilization_gx processor_bus_utilization_gx;
+	struct cv_mc_counts mc_counts;
+	struct cv_processor_bus_utilization_mc processor_bus_utilization_mc;
+	struct cv_processor_config processor_config;
+	struct cv_processor_frequency processor_frequency;
+	struct cv_processor_core_utilization processor_core_utilization;
+	struct cv_processor_core_power_mode processor_core_power_mode;
+	struct cv_affinity_domain_information_by_virutal_processor affinity_domain_information_by_virutal_processor;
+	struct cv_affinity_domain_info_by_domain affinity_domain_info_by_domain;
+	struct cv_affinity_domain_info_by_partition affinity_domain_info_by_partition;
+	struct cv_affinity_domain_elem affinity_domain_elem;
+	struct cv_physical_memory_info physical_memory_info;
+	struct cv_processor_bus_topology processor_bus_topology;
+	struct cv_partition_hypervisor_queuing_times partition_hypervisor_queuing_times;
+	struct cv_system_hypervisor_times system_hypervisor_times;
+
+	/* LAB */
+	struct cv_set_mmcrh set_mmcrh;
+	struct cv_get_hpmcx get_hpmcx;
+};
+
+#endif
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 5/8] powerpc: add 24x7 interface header
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
                   ` (3 preceding siblings ...)
  2014-01-16 23:53 ` [PATCH 4/8] powerpc: add hv_gpci interface header Cody P Schafer
@ 2014-01-16 23:53 ` Cody P Schafer
  2014-02-01  5:58   ` Michael Ellerman
  2014-01-16 23:53 ` [PATCH 6/8] powerpc/perf: add support for the hv gpci (get performance counter info) interface Cody P Schafer
                   ` (3 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

24x7 (also called hv_24x7 or H_24X7) is an interface to obtain
performance counters from the hypervisor. These counters do not have a
fixed format/possition and are instead documented in a "24x7 Catalog",
which is provided by the hypervisor (that interface is also documented
in this header).

This method of obtaining performance counters from the hypervisor is
intended to paritialy replace the gpci interface.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/include/asm/hv_24x7.h | 239 +++++++++++++++++++++++++++++++++++++
 1 file changed, 239 insertions(+)
 create mode 100644 arch/powerpc/include/asm/hv_24x7.h

diff --git a/arch/powerpc/include/asm/hv_24x7.h b/arch/powerpc/include/asm/hv_24x7.h
new file mode 100644
index 0000000..f77b3cc
--- /dev/null
+++ b/arch/powerpc/include/asm/hv_24x7.h
@@ -0,0 +1,239 @@
+#ifndef ARCH_POWERPC_24X7_H_
+#define ARCH_POWERPC_24X7_H_
+
+#include <linux/types.h>
+
+struct hv_24x7_request {
+	/* PHYSICAL domains require enabling via phyp/hmc. */
+#define HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP 0x01
+#define HV_24X7_PERF_DOMAIN_PHYSICAL_CORE 0x02
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CORE   0x03
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_CHIP   0x04
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_HOME_NODE   0x05
+#define HV_24X7_PERF_DOMAIN_VIRTUAL_PROCESSOR_REMOTE_NODE 0x06
+	__u8 performance_domain;
+	__u8 reserved[0x1];
+
+	/* bytes to read starting at @data_offset. must be a multiple of 8 */
+	__be16 data_size;
+
+	/*
+	 * byte offset within the perf domain to read from. must be 8 byte
+	 * aligned
+	 */
+	__be32 data_offset;
+
+	/*
+	 * only valid for VIRTUAL_PROCESSOR domains, ignored for others.
+	 * -1 means "current partition only"
+	 *  Enabling via phyp/hmc required for non-"-1" values. 0 forbidden
+	 *  unless requestor is 0.
+	 */
+	__be16 starting_lpar_ix;
+
+	/*
+	 * Ignored when @starting_lpar_ix == -1
+	 * Ignored when @performance_domain is not VIRTUAL_PROCESSOR_*
+	 * -1 means "infinite" or all
+	 */
+	__be16 max_num_lpars;
+
+	/* chip, core, or virtual processor based on @performance_domain */
+	__be16 starting_ix;
+	__be16 max_ix;
+} __packed;
+
+struct hv_24x7_request_buffer {
+	/* 0 - ? */
+	/* 1 - ? */
+#define HV_24X7_IF_VERSION_CURRENT 0x01
+	__u8 interface_version;
+	__u8 num_requests;
+	__u8 reserved[0xE];
+	struct hv_24x7_request requests[];
+} __packed;
+
+struct hv_24x7_result_element {
+	__be16 lpar_ix;
+
+	/*
+	 * represents the core, chip, or virtual processor based on the
+	 * request's @performance_domain
+	 */
+	__be16 domain_ix;
+
+	/* -1 if @performance_domain does not refer to a virtual processor */
+	__be32 lpar_cfg_instance_id;
+
+	/* size = @result_element_data_size of cointaining result. */
+	__u8 element_data[];
+} __packed;
+
+struct hv_24x7_result {
+	__u8 result_ix;
+
+	/*
+	 * 0 = not all result elements fit into the buffer, additional requests
+	 *     required
+	 * 1 = all result elements were returned
+	 */
+	__u8 results_complete;
+	__be16 num_elements_returned;
+
+	/* This is a copy of @data_size from the coresponding hv_24x7_request */
+	__be16 result_element_data_size;
+	__u8 reserved[0x2];
+
+	/* WARNING: only valid for first result element due to variable sizes
+	 *          of result elements */
+	/* struct hv_24x7_result_element[@num_elements_returned] */
+	struct hv_24x7_result_element elements[];
+} __packed;
+
+struct hv_24x7_data_result_buffer {
+	/* See versioning for request buffer */
+	__u8 interface_version;
+
+	__u8 num_results;
+	__u8 reserved[0x1];
+	__u8 failing_request_ix;
+	__be32 detailed_rc;
+	__be64 cec_cfg_instance_id;
+	__be64 catalog_version_num;
+	__u8 reserved2[0x8];
+	/* WARNING: only valid for the first result due to variable sizes of
+	 *	    results */
+	struct hv_24x7_result results[]; /* [@num_results] */
+} __packed;
+
+/* From document "24x7 Event and Group Catalog Formats Proposal" v0.14 */
+struct hv_24x7_catalog_page_0 {
+#define HV_24X7_CATALOG_MAGIC 0x32347837 /* "24x7" in ASCII */
+	__be32 magic;
+	__be32 length; /* In 4096 byte pages */
+	__u8 reserved1[4];
+	__be32 version;
+	__u8 build_time_stamp[16]; /* "YYYYMMDDHHMMSS\0\0" */
+	__u8 reserved2[32];
+	__be16 schema_data_offs; /* in 4096 byte pages */
+	__be16 schema_data_len;  /* in 4096 byte pages */
+	__be16 schema_entry_count;
+	__u8 reserved3[2];
+	__be16 group_data_offs; /* in 4096 byte pages */
+	__be16 group_data_len;  /* in 4096 byte pages */
+	__be16 group_entry_count;
+	__u8 reserved4[2];
+	__be16 formula_data_offs; /* in 4096 byte pages */
+	__be16 formula_data_len;  /* in 4096 byte pages */
+	__be16 formula_entry_count;
+	__u8 reserved5[2];
+} __packed;
+
+struct hv_24x7_event_data {
+	__be16 length; /* in bytes, must be a multiple of 16 */
+	__u8 reserved1[2];
+	__u8 domain; /* Chip = 1, Core = 2 */
+	__u8 reserved2[1];
+	__be16 event_group_record_offs; /* in bytes, must be 8 byte aligned */
+	__be16 event_group_record_len; /* in bytes */
+
+	/* in bytes, offset from event_group_record */
+	__be16 event_counter_offs;
+
+	/* verified_state, unverified_state, caveat_state, broken_state, ... */
+	__be32 flags;
+
+	__be16 primary_group_ix;
+	__be16 group_count;
+	__be16 event_name_len;
+	__u8 remainder[];
+	/* __u8 event_name[event_name_len - 2]; */
+	/* __be16 event_description_len; */
+	/* __u8 event_desc[event_description_len - 2]; */
+	/* __be16 detailed_desc_len; */
+	/* __u8 detailed_desc[detailed_desc_len - 2]; */
+} __packed;
+
+struct hv_24x7_group_data {
+	__be16 length; /* in bytes, must be multiple of 16 */
+	__u8 reserved1[2];
+	__be32 flags; /* undefined contents */
+	__u8 domain; /* Chip = 1, Core = 2 */
+	__u8 reserved2[1];
+	__be16 event_group_record_offs;
+	__be16 event_group_record_len;
+	__u8 group_schema_ix;
+	__u8 event_count; /* 1 to 16 */
+	__be16 event_ixs;
+	__be16 group_name_len;
+	__u8 remainder[];
+	/* __u8 group_name[group_name_len]; */
+	/* __be16 group_desc_len; */
+	/* __u8 group_desc[group_desc_len]; */
+} __packed;
+
+/* TODO: Schema Data */
+/* TODO: Event Counter Group Record (see the PORE/SLW workbook) */
+
+/* "Get Event Counter Group Record Schema hypervisor interface" */
+
+enum hv_24x7_grs_field_enums {
+	/* GRS_COUNTER_1 = 1
+	 * GRS_COUNTER_2 = 2
+	 * ...
+	 * GRS_COUNTER_31 = 32 // FIXME: Doc issue.
+	 */
+	GRS_COUNTER_BASE = 1,
+	GRS_COUNTER_LAST = 32,
+	GRS_TIMEBASE_UPDATE = 48,
+	GRS_TIMEBASE_FENCE = 49,
+	GRS_UPDATE_COUNT = 50,
+	GRS_MEASUREMENT_PERIOD = 51,
+	GRS_ACCUMULATED_MEASUREMENT_PERIOD = 52,
+	GRS_LAST_UPDATE_PERIOD = 53,
+	GRS_STATUS_FLAGS = 54,
+};
+
+enum hv_24x7_grs_enums {
+	GRS_CORE_SCHEMA_INDEX = 0,
+};
+
+struct hv_24x7_grs_field {
+	__be16 field_enum;
+	__be16 offs; /* in bytes, within Event Counter group record */
+	__be16 length; /* in bytes */
+	__be16 flags; /* presently unused */
+} __packed;
+
+struct hv_24x7_grs {
+	__be16 length;
+	__u8 reserved1[2];
+	__be16 descriptor;
+	__be16 version_id;
+	__u8 reserved2[6];
+	__be16 field_entry_count;
+	__u8 field_entrys[];
+} __packed;
+
+struct hv_24x7_formula_data {
+	__be32 length; /* in bytes, must be multiple of 16 */
+	__u8 reserved1[2];
+	__be32 flags; /* not yet defined */
+	__be16 group;
+	__u8 reserved2[6];
+	__be16 name_len;
+	__u8 remainder[];
+	/* __u8 name[name_len]; */
+	/* __be16 desc_len; */
+	/* __u8 desc[name_len]; */
+	/* __be16 formula_len */
+	/* __u8 formula[formula_len]; */
+} __packed;
+
+/* Formula Syntax: ie, impliment a forth interpereter. */
+/* need fast lookup of the formula names, event names, "delta-timebase",
+ * "delta-cycles", "delta-instructions", "delta-seconds" */
+/* operators: '+', '-', '*', '/', 'mod', 'rem', 'sqr', 'x^y' (XXX: pow? xor?),
+ *            'rot', 'dup' */
+
+#endif
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 6/8] powerpc/perf: add support for the hv gpci (get performance counter info) interface
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
                   ` (4 preceding siblings ...)
  2014-01-16 23:53 ` [PATCH 5/8] powerpc: add 24x7 " Cody P Schafer
@ 2014-01-16 23:53 ` Cody P Schafer
  2014-02-01  5:58   ` Michael Ellerman
  2014-01-16 23:53 ` [PATCH 7/8] powerpc/perf: add support for the hv 24x7 interface Cody P Schafer
                   ` (2 subsequent siblings)
  8 siblings, 1 reply; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

This provides a basic link between perf and hv_gpci. Notably, it does
not yet support transactions and does not list any events (they can
still be manually composed).

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-gpci.c | 235 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 235 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-gpci.c

diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
new file mode 100644
index 0000000..31d9d59
--- /dev/null
+++ b/arch/powerpc/perf/hv-gpci.c
@@ -0,0 +1,235 @@
+/*
+ * Hypervisor supplied "gpci" ("get performance counter info") performance
+ * counter support
+ *
+ * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+#define pr_fmt(fmt) "hv-gpci: " fmt
+
+#include <linux/module.h>
+#include <linux/perf_event.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/hv_gpci.h>
+#include <asm/io.h>
+
+/* See arch/powerpc/include/asm/hv_gpci.h for details on the hcall interface */
+
+PMU_RANGE_ATTR(request, config, 0, 31); /* u32 */
+PMU_RANGE_ATTR(starting_index, config, 32, 63); /* u32 */
+PMU_RANGE_ATTR(secondary_index, config1, 0, 15); /* u16 */
+PMU_RANGE_ATTR(counter_info_version, config1, 16, 23); /* u8 */
+PMU_RANGE_ATTR(length, config1, 24, 31); /* u8, bytes of data (1-8) */
+PMU_RANGE_ATTR(offset, config1, 32, 63); /* u32, byte offset */
+
+static struct attribute *format_attr[] = {
+	&format_attr_request.attr,
+	&format_attr_starting_index.attr,
+	&format_attr_secondary_index.attr,
+	&format_attr_counter_info_version.attr,
+
+	&format_attr_offset.attr,
+	&format_attr_length.attr,
+	NULL,
+};
+
+static struct attribute_group format_group = {
+	.name = "format",
+	.attrs = format_attr,
+};
+
+static const struct attribute_group *attr_groups[] = {
+	&format_group,
+	NULL,
+};
+
+static unsigned long single_gpci_request(u32 req, u32 starting_index,
+		u16 secondary_index, u8 version_in, u32 offset, u8 length,
+		u64 *value)
+{
+	unsigned long ret;
+	size_t i;
+	u64 count;
+
+	struct {
+		struct hv_get_perf_counter_info_params params;
+		union {
+			union h_gpci_cvs data;
+			uint8_t bytes[sizeof(union h_gpci_cvs)];
+		};
+	} arg = {
+		.params = {
+			.counter_request = cpu_to_be32(req),
+			.starting_index = cpu_to_be32(starting_index),
+			.secondary_index = cpu_to_be16(secondary_index),
+			.counter_info_version_in = version_in,
+		}
+	};
+
+	ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+			virt_to_phys(&arg), sizeof(arg));
+	if (ret) {
+		pr_devel("hcall failed: 0x%lx\n", ret);
+		return ret;
+	}
+
+	/*
+	 * we verify offset and length are within the zeroed buffer at event
+	 * init.
+	 */
+	count = 0;
+	for (i = offset; i < offset + length; i++)
+		count |= arg.bytes[i] << (i - offset);
+
+	*value = count;
+	return ret;
+}
+
+static u64 h_gpci_get_value(struct perf_event *event)
+{
+	u64 count;
+	unsigned long ret = single_gpci_request(event_get_request(event),
+					event_get_starting_index(event),
+					event_get_secondary_index(event),
+					event_get_counter_info_version(event),
+					event_get_offset(event),
+					event_get_length(event),
+					&count);
+	if (ret)
+		return 0;
+	return count;
+}
+
+static void h_gpci_event_update(struct perf_event *event)
+{
+	s64 prev;
+	u64 now = h_gpci_get_value(event);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void h_gpci_event_start(struct perf_event *event, int flags)
+{
+	local64_set(&event->hw.prev_count, h_gpci_get_value(event));
+	perf_swevent_start_hrtimer(event);
+}
+
+static void h_gpci_event_stop(struct perf_event *event, int flags)
+{
+	perf_swevent_cancel_hrtimer(event);
+	h_gpci_event_update(event);
+}
+
+static int h_gpci_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		h_gpci_event_start(event, flags);
+
+	return 0;
+}
+
+static void h_gpci_event_del(struct perf_event *event, int flags)
+{
+	h_gpci_event_stop(event, flags);
+}
+
+static void h_gpci_event_read(struct perf_event *event)
+{
+	h_gpci_event_update(event);
+}
+
+static int h_gpci_event_init(struct perf_event *event)
+{
+	u64 count;
+	u8 length;
+
+	/* Not our event */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* config2 is unused */
+	if (event->attr.config2)
+		return -EINVAL;
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest  ||
+	    is_sampling_event(event)) /* no sampling */
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	length = event_get_length(event);
+	if (length < 1 || length > 8)
+		return -EINVAL;
+
+	/* last byte within the buffer? */
+	if ((event_get_offset(event) + length) > sizeof(union h_gpci_cvs))
+		return -EINVAL;
+
+	/* check if the request works... */
+	if (single_gpci_request(event_get_request(event),
+				event_get_starting_index(event),
+				event_get_secondary_index(event),
+				event_get_counter_info_version(event),
+				event_get_offset(event),
+				length,
+				&count))
+		return -EINVAL;
+
+	/*
+	 * Some of the events are per-cpu, some per-core, some per-chip, some
+	 * are global, and some access data from other virtual machines on the
+	 * same physical machine. We can't map the cpu value without a lot of
+	 * work. Instead, we pick an arbitrary cpu for all events on this pmu.
+	 */
+	event->cpu = 0;
+
+	perf_swevent_init_hrtimer(event);
+	return 0;
+}
+
+struct pmu h_gpci_pmu = {
+	.task_ctx_nr = perf_invalid_context,
+
+	.name = "hv_gpci",
+	.attr_groups = attr_groups,
+	.event_init = h_gpci_event_init,
+	.add = h_gpci_event_add,
+	.del = h_gpci_event_del,
+	.start = h_gpci_event_start,
+	.stop = h_gpci_event_stop,
+	.read = h_gpci_event_read,
+
+	.event_idx = perf_swevent_event_idx,
+};
+
+static int hv_gpci_init(void)
+{
+	int r;
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+		pr_info("Not running under phyp, not supported\n");
+		return -ENODEV;
+	}
+
+	r = perf_pmu_register(&h_gpci_pmu, h_gpci_pmu.name, -1);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+module_init(hv_gpci_init);
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 7/8] powerpc/perf: add support for the hv 24x7 interface
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
                   ` (5 preceding siblings ...)
  2014-01-16 23:53 ` [PATCH 6/8] powerpc/perf: add support for the hv gpci (get performance counter info) interface Cody P Schafer
@ 2014-01-16 23:53 ` Cody P Schafer
  2014-01-16 23:53 ` [PATCH 8/8] powerpc/perf: add kconfig option for hypervisor provided counters Cody P Schafer
  2014-01-22  1:32 ` [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Michael Ellerman
  8 siblings, 0 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

This provides a basic interface between hv_24x7 and perf. Similar to
the one provided for gpci, it lacks transaction support and does not
list any events.

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/hv-24x7.c | 354 ++++++++++++++++++++++++++++++++++++++++++++
 1 file changed, 354 insertions(+)
 create mode 100644 arch/powerpc/perf/hv-24x7.c

diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c
new file mode 100644
index 0000000..fb49e66
--- /dev/null
+++ b/arch/powerpc/perf/hv-24x7.c
@@ -0,0 +1,354 @@
+/*
+ * Hypervisor supplied "24x7" performance counter support
+ *
+ * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
+ * Copyright 2014 IBM Corporation.
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#define pr_fmt(fmt) "hv-24x7: " fmt
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <asm/firmware.h>
+#include <asm/hvcall.h>
+#include <asm/hv_24x7.h>
+#include <asm/hv_gpci.h>
+#include <asm/io.h>
+
+/* See arch/powerpc/include/asm/24x7.h for details on the hcall interface */
+
+/* TODO: Merging events:
+ * - Think of the hcall as an interface to a 4d array of counters:
+ *   - x = domains
+ *   - y = indexes in the domain (core, chip, vcpu, node, etc)
+ *   - z = offset into the counter space
+ *   - w = lpars (guest vms, "logical partitions")
+ * - A single request is: x,y,y_last,z,z_last,w,w_last
+ *   - this means we can retrieve a rectangle of counters in y,z for a single x.
+ *
+ * - Things to consider (ignoring w):
+ *   - input  cost_per_request = 16
+ *   - output cost_per_result(ys,zs)  = 8 + 8 * ys + ys * zs
+ *   - limited number of requests per hcall (must fit into 4K bytes)
+ *     - 4k = 16 [buffer header] - 16 [request size] * request_count
+ *     - 255 requests per hcall
+ *   - sometimes it will be more efficient to read extra data and discard
+ */
+
+PMU_RANGE_ATTR(domain, config, 0, 3); /* u3 0-6, one of HV_24X7_PERF_DOMAIN */
+PMU_RANGE_ATTR(starting_index, config, 16, 31); /* u16 */
+PMU_RANGE_ATTR(offset, config, 32, 63); /* u32, see "data_offset" */
+PMU_RANGE_ATTR(lpar, config1, 0, 15); /* u16 */
+
+PMU_RANGE_RESV(reserved1, config,   4, 15);
+PMU_RANGE_RESV(reserved2, config1, 16, 63);
+PMU_RANGE_RESV(reserved3, config2,  0, 63);
+
+static struct attribute *format_attr[] = {
+	&format_attr_domain.attr,
+	&format_attr_offset.attr,
+	&format_attr_starting_index.attr,
+	&format_attr_lpar.attr,
+	NULL,
+};
+
+static struct attribute_group format_group = {
+	.name = "format",
+	.attrs = format_attr,
+};
+
+static const struct attribute_group *attr_groups[] = {
+	&format_group,
+	NULL,
+};
+
+struct hv_perf_caps {
+	u16 version;
+	u16 other_allowed:1,
+	    ga:1,
+	    expanded:1,
+	    lab:1,
+	    unused:12;
+};
+
+static unsigned long hv_perf_caps_get(struct hv_perf_caps *caps)
+{
+	unsigned long r;
+	struct p {
+		struct hv_get_perf_counter_info_params params;
+		struct cv_system_performance_capabilities caps;
+	} __packed __aligned(sizeof(uint64_t));
+
+	struct p arg = {
+		.params = {
+			.counter_request = cpu_to_be32(
+					CIR_system_performance_capabilities),
+			.starting_index = cpu_to_be32(-1),
+			.counter_info_version_in = 0,
+		}
+	};
+
+	r = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
+			       virt_to_phys(&arg), sizeof(arg));
+	caps->version = arg.params.counter_info_version_out;
+	caps->other_allowed = arg.caps.perf_collect_privlidged;
+	caps->ga = (arg.caps.capability_mask & CV_CM_GA) >> CV_CM_GA;
+	caps->expanded = (arg.caps.capability_mask & CV_CM_EXPANDED)
+				>> CV_CM_EXPANDED;
+	caps->lab = (arg.caps.capability_mask & CV_CM_LAB) >> CV_CM_LAB;
+
+	return r;
+}
+
+static bool is_physical_domain(int domain)
+{
+	return  domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CHIP ||
+		domain == HV_24X7_PERF_DOMAIN_PHYSICAL_CORE;
+}
+
+static unsigned long single_24x7_request(u8 domain, u32 offset, u16 ix,
+					 u16 lpar, u64 *res)
+{
+	unsigned long ret;
+	struct reqb {
+		struct hv_24x7_request_buffer buf;
+		struct hv_24x7_request req;
+	} request_buffer = {
+		.buf = {
+			.interface_version = HV_24X7_IF_VERSION_CURRENT,
+			.num_requests = 1,
+		},
+		.req = {
+			.performance_domain = domain,
+			.data_size = cpu_to_be16(8),
+			.data_offset = cpu_to_be32(offset),
+			.starting_lpar_ix = cpu_to_be16(lpar),
+			.max_num_lpars = cpu_to_be16(1),
+			.starting_ix = cpu_to_be16(ix),
+			.max_ix = cpu_to_be16(1),
+		}
+	};
+
+	struct resb {
+		struct hv_24x7_data_result_buffer buf;
+		struct hv_24x7_result res;
+		struct hv_24x7_result_element elem;
+		__be64 result;
+	} result_buffer = {};
+
+	ret = plpar_hcall_norets(H_GET_24X7_DATA,
+			virt_to_phys(&request_buffer), sizeof(request_buffer),
+			virt_to_phys(&result_buffer),  sizeof(result_buffer));
+
+	if (ret) {
+		/*
+		 * this failure is unexpected since we check if the exact same
+		 * hcall works in event_init
+		 */
+		pr_err_ratelimited("hcall failed: %d %#x %#x %d => 0x%lx (%ld) detail=0x%x failing ix=%x\n",
+				domain, offset, ix, lpar,
+				ret, ret,
+				result_buffer.buf.detailed_rc,
+				result_buffer.buf.failing_request_ix);
+		return ret;
+	}
+
+	*res = be64_to_cpu(result_buffer.result);
+	return ret;
+}
+
+static int h_24x7_event_init(struct perf_event *event)
+{
+	struct hv_perf_caps caps;
+	unsigned domain;
+	u64 ct;
+
+	/* Not our event */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* Unused areas must be 0 */
+	if (event_get_reserved1(event) ||
+	    event_get_reserved2(event) ||
+	    event_get_reserved3(event)) {
+		pr_devel("reserved set when forbidden 0x%llx(0x%llx) 0x%llx(0x%llx) 0x%llx(0x%llx)\n",
+				event->attr.config,
+				event_get_reserved1(event),
+				event->attr.config1,
+				event_get_reserved2(event),
+				event->attr.config2,
+				event_get_reserved3(event));
+		return -EINVAL;
+	}
+
+	/* unsupported modes and filters */
+	if (event->attr.exclude_user   ||
+	    event->attr.exclude_kernel ||
+	    event->attr.exclude_hv     ||
+	    event->attr.exclude_idle   ||
+	    event->attr.exclude_host   ||
+	    event->attr.exclude_guest  ||
+	    is_sampling_event(event)) /* no sampling */
+		return -EINVAL;
+
+	/* no branch sampling */
+	if (has_branch_stack(event))
+		return -EOPNOTSUPP;
+
+	/* offset must be 8 byte aligned */
+	if (event_get_offset(event) % 8) {
+		pr_devel("bad alignment\n");
+		return -EINVAL;
+	}
+
+	/* Domains above 6 are invalid */
+	domain = event_get_domain(event);
+	if (domain > 6) {
+		pr_devel("invalid domain\n");
+		return -EINVAL;
+	}
+
+	if (hv_perf_caps_get(&caps)) {
+		pr_devel("could not get capabilities\n");
+		return -EIO;
+	}
+
+	/* PHYSICAL domains & other lpars require extra capabilities */
+	if (!caps.other_allowed && (is_physical_domain(domain) ||
+		(event_get_lpar(event) != event_get_lpar_max()))) {
+		pr_devel("hv permisions disallow\n");
+		return -EPERM;
+	}
+
+	/* see if the event complains */
+	if (single_24x7_request(event_get_domain(event),
+				event_get_offset(event),
+				event_get_starting_index(event),
+				event_get_lpar(event),
+				&ct)) {
+		pr_devel("test hcall failed\n");
+		return -EIO;
+	}
+
+	/*
+	 * Some of the events are per-cpu, some per-core, some per-chip, some
+	 * are global, and some access data from other virtual machines on the
+	 * same physical machine. We can't map the cpu value without a lot of
+	 * work. Instead, we pick an arbitrary cpu for all events on this pmu.
+	 */
+	event->cpu = 0;
+
+	perf_swevent_init_hrtimer(event);
+	return 0;
+}
+
+static u64 h_24x7_get_value(struct perf_event *event)
+{
+	unsigned long ret;
+	u64 ct;
+	ret = single_24x7_request(event_get_domain(event),
+				  event_get_offset(event),
+				  event_get_starting_index(event),
+				  event_get_lpar(event),
+				  &ct);
+	if (ret)
+		/* We checked this in event init, shouldn't fail here... */
+		return 0;
+
+	return ct;
+}
+
+static void h_24x7_event_update(struct perf_event *event)
+{
+	s64 prev;
+	u64 now;
+	now = h_24x7_get_value(event);
+	prev = local64_xchg(&event->hw.prev_count, now);
+	local64_add(now - prev, &event->count);
+}
+
+static void h_24x7_event_start(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_RELOAD)
+		local64_set(&event->hw.prev_count, h_24x7_get_value(event));
+	perf_swevent_start_hrtimer(event);
+}
+
+static void h_24x7_event_stop(struct perf_event *event, int flags)
+{
+	perf_swevent_cancel_hrtimer(event);
+	h_24x7_event_update(event);
+}
+
+static int h_24x7_event_add(struct perf_event *event, int flags)
+{
+	if (flags & PERF_EF_START)
+		h_24x7_event_start(event, flags);
+
+	return 0;
+}
+
+static void h_24x7_event_del(struct perf_event *event, int flags)
+{
+	h_24x7_event_stop(event, flags);
+}
+
+static void h_24x7_event_read(struct perf_event *event)
+{
+	h_24x7_event_update(event);
+}
+
+struct pmu h_24x7_pmu = {
+	.task_ctx_nr = perf_invalid_context,
+
+	.name = "hv_24x7",
+	.attr_groups = attr_groups,
+	.event_init  = h_24x7_event_init,
+	.add         = h_24x7_event_add,
+	.del         = h_24x7_event_del,
+	.start       = h_24x7_event_start,
+	.stop        = h_24x7_event_stop,
+	.read        = h_24x7_event_read,
+
+	.event_idx = perf_swevent_event_idx,
+};
+
+static int hv_24x7_init(void)
+{
+	int r;
+	unsigned long hret;
+	struct hv_perf_caps caps;
+
+	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
+		pr_info("not an lpar, disabled\n");
+		return -ENODEV;
+	}
+
+	hret = hv_perf_caps_get(&caps);
+	if (hret) {
+		pr_info("could not obtain capabilities, error 0x%80lx, disabling\n",
+				hret);
+		return -ENODEV;
+	}
+
+	pr_info("gpci interface versions: hv:0x%x, kernel:0x%x\n",
+			caps.version, COUNTER_INFO_VERSION_CURRENT);
+
+	pr_info("gpci interface capabilities: other:%d ga:%d expanded:%d lab:%d\n",
+			caps.other_allowed, caps.ga,
+			caps.expanded,
+			caps.lab);
+
+	r = perf_pmu_register(&h_24x7_pmu, h_24x7_pmu.name, -1);
+	if (r)
+		return r;
+
+	return 0;
+}
+
+module_init(hv_24x7_init);
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 8/8] powerpc/perf: add kconfig option for hypervisor provided counters
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
                   ` (6 preceding siblings ...)
  2014-01-16 23:53 ` [PATCH 7/8] powerpc/perf: add support for the hv 24x7 interface Cody P Schafer
@ 2014-01-16 23:53 ` Cody P Schafer
  2014-01-22  1:32 ` [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Michael Ellerman
  8 siblings, 0 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-01-16 23:53 UTC (permalink / raw)
  To: Linux PPC
  Cc: LKML, Peter Zijlstra, Paul Mackerras, Arnaldo Carvalho de Melo,
	Ingo Molnar, Benjamin Herrenschmidt, Cody P Schafer

Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
---
 arch/powerpc/perf/Makefile             | 2 ++
 arch/powerpc/platforms/Kconfig.cputype | 6 ++++++
 2 files changed, 8 insertions(+)

diff --git a/arch/powerpc/perf/Makefile b/arch/powerpc/perf/Makefile
index 60d71ee..5e5fcd2 100644
--- a/arch/powerpc/perf/Makefile
+++ b/arch/powerpc/perf/Makefile
@@ -11,5 +11,7 @@ obj32-$(CONFIG_PPC_PERF_CTRS)	+= mpc7450-pmu.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT) += core-fsl-emb.o
 obj-$(CONFIG_FSL_EMB_PERF_EVENT_E500) += e500-pmu.o e6500-pmu.o
 
+obj-$(CONFIG_HV_PERF_CTRS) += hv-24x7.o hv-gpci.o
+
 obj-$(CONFIG_PPC64)		+= $(obj64-y)
 obj-$(CONFIG_PPC32)		+= $(obj32-y)
diff --git a/arch/powerpc/platforms/Kconfig.cputype b/arch/powerpc/platforms/Kconfig.cputype
index bca2465..f98ec61 100644
--- a/arch/powerpc/platforms/Kconfig.cputype
+++ b/arch/powerpc/platforms/Kconfig.cputype
@@ -363,6 +363,12 @@ config PPC_PERF_CTRS
        help
          This enables the powerpc-specific perf_event back-end.
 
+config HV_PERF_CTRS
+       def_bool y
+       depends on PERF_EVENTS && PPC_HAVE_PMU_SUPPORT
+       help
+         Enable access to perf counters provided by the hypervisor
+
 config SMP
 	depends on PPC_BOOK3S || PPC_BOOK3E || FSL_BOOKE || PPC_47x
 	bool "Symmetric multi-processing support"
-- 
1.8.5.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
  2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
                   ` (7 preceding siblings ...)
  2014-01-16 23:53 ` [PATCH 8/8] powerpc/perf: add kconfig option for hypervisor provided counters Cody P Schafer
@ 2014-01-22  1:32 ` Michael Ellerman
  2014-01-22  9:37   ` Anshuman Khandual
  2014-01-23  0:11   ` Cody P Schafer
  8 siblings, 2 replies; 24+ messages in thread
From: Michael Ellerman @ 2014-01-22  1:32 UTC (permalink / raw)
  To: Cody P Schafer
  Cc: Linux PPC, Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo

On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
> performance counters: gpci ("get performance counter info") and 24x7.
> 
> The counters supplied by these interfaces are continually counting and never
> need to be (and cannot be) disabled or enabled. They additionally do not
> generate any interrupts. This makes them in some regards similar to software
> counters, and as a result their implimentation shares some common code (which
> an initial patch exposes) with the sw counters.

Hi Cody,

Can you please add some more explanation of this series.

In particular why do we need two new PMUs, and how do they relate to each
other?

And can you add an example of how I'd actually use them using perf.

cheers



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
  2014-01-22  1:32 ` [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Michael Ellerman
@ 2014-01-22  9:37   ` Anshuman Khandual
  2014-01-23  0:11   ` Cody P Schafer
  1 sibling, 0 replies; 24+ messages in thread
From: Anshuman Khandual @ 2014-01-22  9:37 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Cody P Schafer, Peter Zijlstra, LKML, Ingo Molnar,
	Paul Mackerras, Arnaldo Carvalho de Melo, Linux PPC

On 01/22/2014 07:02 AM, Michael Ellerman wrote:
> On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
>> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
>> performance counters: gpci ("get performance counter info") and 24x7.
>>
>> The counters supplied by these interfaces are continually counting and never
>> need to be (and cannot be) disabled or enabled. They additionally do not
>> generate any interrupts. This makes them in some regards similar to software
>> counters, and as a result their implimentation shares some common code (which
>> an initial patch exposes) with the sw counters.
> 
> Hi Cody,
> 
> Can you please add some more explanation of this series.
> 
> In particular why do we need two new PMUs, and how do they relate to each
> other?
> 
> And can you add an example of how I'd actually use them using perf.
> 

Yeah, agreed.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
  2014-01-22  1:32 ` [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Michael Ellerman
  2014-01-22  9:37   ` Anshuman Khandual
@ 2014-01-23  0:11   ` Cody P Schafer
  2014-01-31 20:59     ` Cody P Schafer
  1 sibling, 1 reply; 24+ messages in thread
From: Cody P Schafer @ 2014-01-23  0:11 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Linux PPC, Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo

On 01/21/2014 05:32 PM, Michael Ellerman wrote:
> On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
>> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
>> performance counters: gpci ("get performance counter info") and 24x7.
>>
>> The counters supplied by these interfaces are continually counting and never
>> need to be (and cannot be) disabled or enabled. They additionally do not
>> generate any interrupts. This makes them in some regards similar to software
>> counters, and as a result their implimentation shares some common code (which
>> an initial patch exposes) with the sw counters.
> 
> Hi Cody,
> 
> Can you please add some more explanation of this series.
Sure
 > In particular why do we need two new PMUs, and how do they relate to each
> other?
These 2 PMUs end up providing access to some cpu, core, and chip level counters not
exposed via other interfaces, and additionally allow monitoring the performance of
other lpars (guests) on the same host system. Because it provides access to core and
chip level counters, this pair of PMUs could be thought of as powerpc's counterpart
to x86's uncore events.

As an example, "processor_bus_utilization_abc" and "processor_bus_utilization_wxyz"
(in hv_gpci.h) allow retreval of total cycles and idle cycles for various inter-chip buses.

GPCI is an interface that already exists on some power7 machines (depending on the fw
version), but is rather in-flexible and code intensive to add additional counters to.
The 24x7 interfaces currently are designed to co-exist with the gpci interface while
replacing most of gpci's functionality on newer systems. Right now, the 24x7 code I've
submitted uses the gpci calls to check if it has permission to access certain classes
of counters.

> And can you add an example of how I'd actually use them using perf.

# For gpci (formed from reading hv_gpci.h), gets "processor_time_in_timebase_cycles"
perf stat -e 'hv_gpci/counter_info_version=3,offset=0,length=8,secondary_index=0,starting_index=0xffffffff,request=0x10/' -r 0 -a -x ' ' sleep 0.1

# For 24x7, assuming access to hw+fw that supports it, gets a yet-to-be identified counter:
perf stat -e 'hv_24x7/domain=2,offset=8,starting_index=0,lpar=0xffffffff/' -r 0 -C 0 -x ' ' sleep 0.1


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters
  2014-01-23  0:11   ` Cody P Schafer
@ 2014-01-31 20:59     ` Cody P Schafer
  0 siblings, 0 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-01-31 20:59 UTC (permalink / raw)
  To: Michael Ellerman
  Cc: Linux PPC, Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo

On 01/22/2014 04:11 PM, Cody P Schafer wrote:
> On 01/21/2014 05:32 PM, Michael Ellerman wrote:
>> On Thu, 2014-01-16 at 15:53 -0800, Cody P Schafer wrote:
>>> These patches add basic pmus for 2 powerpc hypervisor interfaces to obtain
>>> performance counters: gpci ("get performance counter info") and 24x7.

Any comments on/things that need fixing for this patch set to be merged?


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
  2014-01-16 23:53 ` [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus Cody P Schafer
@ 2014-02-01  5:58   ` Michael Ellerman
  2014-02-03 21:19     ` Cody P Schafer
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Ellerman @ 2014-02-01  5:58 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer

On Thu, 2014-16-01 at 23:53:47 UTC, Cody P Schafer wrote:
> Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
> generate functions to extract the relevent bits from
> event->attr.config{,1,2} for use by sw-like pmus where the
> 'config{,1,2}' values don't map directly to hardware registers.

This is neat.

The split of the macros is a bit weird, ie. PMU_RANGE_RESV() doesn't really do
what it's name suggests.

I think you want one macro which creates the accessors, with a name that
reflects that - yeah I can't think of a good one right now, but "event" should
probably be in there because that's what it operates on.

Having a macro for the reserved regions is good, but you MUST actually check
that the reserved regions are zero. Otherwise you are permitting your caller to
pass junk in there and you then can't unreserved them in a future version of
the API.

So I think a macro that gives you a special reserved region routine would be
good, so you can write something like:

  if (event_check_reserved1() || event_check_reserved2())
  	return -EINVAL;


cheers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
  2014-01-16 23:53 ` [PATCH 3/8] powerpc: add hvcalls for 24x7 and gpci (get performance counter info) Cody P Schafer
@ 2014-02-01  5:58   ` Michael Ellerman
  2014-02-03 21:21     ` Cody P Schafer
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Ellerman @ 2014-02-01  5:58 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer

On Thu, 2014-16-01 at 23:53:49 UTC, Cody P Schafer wrote:
> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  arch/powerpc/include/asm/hvcall.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
> index d8b600b..48d6efa 100644
> --- a/arch/powerpc/include/asm/hvcall.h
> +++ b/arch/powerpc/include/asm/hvcall.h
> @@ -269,11 +269,15 @@
>  #define H_COP			0x304
>  #define H_GET_MPP_X		0x314
>  #define H_SET_MODE		0x31C
> -#define MAX_HCALL_OPCODE	H_SET_MODE
> +#define H_GET_24X7_CATALOG_PAGE 0xF078
> +#define H_GET_24X7_DATA		0xF07C
> +#define H_GET_PERF_COUNTER_INFO 0xF080

Ugh, why the hell did they put them up there.

> +#define MAX_HCALL_OPCODE	H_GET_PERF_COUNTER_INFO

We have an array which is sized based on this, which is unpleasant.

I think you're better off putting these below in the platform specific section,
and leaving MAX_HCALL_OPCODE alone. The only downside is you can't use the
hcall tracing to see them.

>  /* Platform specific hcalls, used by KVM */
>  #define H_RTAS			0xf000


cheers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 2/8] perf core: export swevent hrtimer helpers
  2014-01-16 23:53 ` [PATCH 2/8] perf core: export swevent hrtimer helpers Cody P Schafer
@ 2014-02-01  5:58   ` Michael Ellerman
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Ellerman @ 2014-02-01  5:58 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer

Peter, Ingo, can we get your ACK on this please?

cheers


On Thu, 2014-16-01 at 23:53:48 UTC, Cody P Schafer wrote:
> Export the swevent hrtimer helpers currently only used in events/core.c
> to allow the addition of architecture specific sw-like pmus.

> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
> ---
>  include/linux/perf_event.h | 5 ++++-
>  kernel/events/core.c       | 8 ++++----
>  2 files changed, 8 insertions(+), 5 deletions(-)
> 
> diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
> index 8646e33..c5bc71a 100644
> --- a/include/linux/perf_event.h
> +++ b/include/linux/perf_event.h
> @@ -558,7 +558,10 @@ extern void perf_pmu_migrate_context(struct pmu *pmu,
>  				int src_cpu, int dst_cpu);
>  extern u64 perf_event_read_value(struct perf_event *event,
>  				 u64 *enabled, u64 *running);
> -
> +extern void perf_swevent_init_hrtimer(struct perf_event *event);
> +extern void perf_swevent_start_hrtimer(struct perf_event *event);
> +extern void perf_swevent_cancel_hrtimer(struct perf_event *event);
> +extern int perf_swevent_event_idx(struct perf_event *event);
>  
>  struct perf_sample_data {
>  	u64				type;
> diff --git a/kernel/events/core.c b/kernel/events/core.c
> index f574401..d881d1e 100644
> --- a/kernel/events/core.c
> +++ b/kernel/events/core.c
> @@ -5801,7 +5801,7 @@ static int perf_swevent_init(struct perf_event *event)
>  	return 0;
>  }
>  
> -static int perf_swevent_event_idx(struct perf_event *event)
> +int perf_swevent_event_idx(struct perf_event *event)
>  {
>  	return 0;
>  }
> @@ -6030,7 +6030,7 @@ static enum hrtimer_restart perf_swevent_hrtimer(struct hrtimer *hrtimer)
>  	return ret;
>  }
>  
> -static void perf_swevent_start_hrtimer(struct perf_event *event)
> +void perf_swevent_start_hrtimer(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  	s64 period;
> @@ -6052,7 +6052,7 @@ static void perf_swevent_start_hrtimer(struct perf_event *event)
>  				HRTIMER_MODE_REL_PINNED, 0);
>  }
>  
> -static void perf_swevent_cancel_hrtimer(struct perf_event *event)
> +void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  
> @@ -6064,7 +6064,7 @@ static void perf_swevent_cancel_hrtimer(struct perf_event *event)
>  	}
>  }
>  
> -static void perf_swevent_init_hrtimer(struct perf_event *event)
> +void perf_swevent_init_hrtimer(struct perf_event *event)
>  {
>  	struct hw_perf_event *hwc = &event->hw;
>  
> -- 
> 1.8.5.2
> 
> _______________________________________________
> Linuxppc-dev mailing list
> Linuxppc-dev@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev
> 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 5/8] powerpc: add 24x7 interface header
  2014-01-16 23:53 ` [PATCH 5/8] powerpc: add 24x7 " Cody P Schafer
@ 2014-02-01  5:58   ` Michael Ellerman
  0 siblings, 0 replies; 24+ messages in thread
From: Michael Ellerman @ 2014-02-01  5:58 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer

On Thu, 2014-16-01 at 23:53:51 UTC, Cody P Schafer wrote:
> 24x7 (also called hv_24x7 or H_24X7) is an interface to obtain
> performance counters from the hypervisor. These counters do not have a
> fixed format/possition and are instead documented in a "24x7 Catalog",
> which is provided by the hypervisor (that interface is also documented
> in this header).
> 
> This method of obtaining performance counters from the hypervisor is
> intended to paritialy replace the gpci interface.

Same comments as for the previous patch.

cheers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] powerpc/perf: add support for the hv gpci (get performance counter info) interface
  2014-01-16 23:53 ` [PATCH 6/8] powerpc/perf: add support for the hv gpci (get performance counter info) interface Cody P Schafer
@ 2014-02-01  5:58   ` Michael Ellerman
  2014-02-03 21:13     ` Cody P Schafer
  0 siblings, 1 reply; 24+ messages in thread
From: Michael Ellerman @ 2014-02-01  5:58 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer

On Thu, 2014-16-01 at 23:53:52 UTC, Cody P Schafer wrote:
> This provides a basic link between perf and hv_gpci. Notably, it does
> not yet support transactions and does not list any events (they can
> still be manually composed).

What are the plans for listing?

The manual compose is nice but pretty hairy to use in practice I would think.

> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
> new file mode 100644
> index 0000000..31d9d59
> --- /dev/null
> +++ b/arch/powerpc/perf/hv-gpci.c
> @@ -0,0 +1,235 @@
> +/*
> + * Hypervisor supplied "gpci" ("get performance counter info") performance
> + * counter support
> + *
> + * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
> + * Copyright 2014 IBM Corporation.
> + *
> + * This program is free software; you can redistribute it and/or
> + * modify it under the terms of the GNU General Public License
> + * as published by the Free Software Foundation; either version
> + * 2 of the License, or (at your option) any later version.
> + */
> +#define pr_fmt(fmt) "hv-gpci: " fmt
> +
> +#include <linux/module.h>
> +#include <linux/perf_event.h>
> +#include <asm/firmware.h>
> +#include <asm/hvcall.h>
> +#include <asm/hv_gpci.h>
> +#include <asm/io.h>

Needed?

> +/* See arch/powerpc/include/asm/hv_gpci.h for details on the hcall interface */
> +
> +PMU_RANGE_ATTR(request, config, 0, 31); /* u32 */
> +PMU_RANGE_ATTR(starting_index, config, 32, 63); /* u32 */
> +PMU_RANGE_ATTR(secondary_index, config1, 0, 15); /* u16 */
> +PMU_RANGE_ATTR(counter_info_version, config1, 16, 23); /* u8 */
> +PMU_RANGE_ATTR(length, config1, 24, 31); /* u8, bytes of data (1-8) */
> +PMU_RANGE_ATTR(offset, config1, 32, 63); /* u32, byte offset */
> +
> +static struct attribute *format_attr[] = {
> +	&format_attr_request.attr,
> +	&format_attr_starting_index.attr,
> +	&format_attr_secondary_index.attr,
> +	&format_attr_counter_info_version.attr,
> +

Lonley blank line.

> +	&format_attr_offset.attr,
> +	&format_attr_length.attr,
> +	NULL,
> +};
> +
> +static struct attribute_group format_group = {
> +	.name = "format",
> +	.attrs = format_attr,
> +};
> +
> +static const struct attribute_group *attr_groups[] = {
> +	&format_group,
> +	NULL,
> +};
> +
> +static unsigned long single_gpci_request(u32 req, u32 starting_index,
> +		u16 secondary_index, u8 version_in, u32 offset, u8 length,
> +		u64 *value)

Passing the event and extracting the values in here would be neater IMHO.

> +{
> +	unsigned long ret;
> +	size_t i;
> +	u64 count;
> +
> +	struct {
> +		struct hv_get_perf_counter_info_params params;
> +		union {
> +			union h_gpci_cvs data;
> +			uint8_t bytes[sizeof(union h_gpci_cvs)];
> +		};
> +	} arg = {
> +		.params = {
> +			.counter_request = cpu_to_be32(req),
> +			.starting_index = cpu_to_be32(starting_index),
> +			.secondary_index = cpu_to_be16(secondary_index),
> +			.counter_info_version_in = version_in,
> +		}
> +	};
> +
> +	ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
> +			virt_to_phys(&arg), sizeof(arg));
> +	if (ret) {
> +		pr_devel("hcall failed: 0x%lx\n", ret);
> +		return ret;
> +	}
> +
> +	/*
> +	 * we verify offset and length are within the zeroed buffer at event
> +	 * init.
> +	 */
> +	count = 0;
> +	for (i = offset; i < offset + length; i++)
> +		count |= arg.bytes[i] << (i - offset);
> +
> +	*value = count;
> +	return ret;
> +}
> +
> +static u64 h_gpci_get_value(struct perf_event *event)
> +{
> +	u64 count;
> +	unsigned long ret = single_gpci_request(event_get_request(event),
> +					event_get_starting_index(event),
> +					event_get_secondary_index(event),
> +					event_get_counter_info_version(event),
> +					event_get_offset(event),
> +					event_get_length(event),
> +					&count);
> +	if (ret)
> +		return 0;
> +	return count;
> +}
> +
> +static void h_gpci_event_update(struct perf_event *event)
> +{
> +	s64 prev;
> +	u64 now = h_gpci_get_value(event);
> +	prev = local64_xchg(&event->hw.prev_count, now);
> +	local64_add(now - prev, &event->count);
> +}
> +
> +static void h_gpci_event_start(struct perf_event *event, int flags)
> +{
> +	local64_set(&event->hw.prev_count, h_gpci_get_value(event));
> +	perf_swevent_start_hrtimer(event);
> +}
> +
> +static void h_gpci_event_stop(struct perf_event *event, int flags)
> +{
> +	perf_swevent_cancel_hrtimer(event);
> +	h_gpci_event_update(event);
> +}
> +
> +static int h_gpci_event_add(struct perf_event *event, int flags)
> +{
> +	if (flags & PERF_EF_START)
> +		h_gpci_event_start(event, flags);
> +
> +	return 0;
> +}
> +
> +static void h_gpci_event_del(struct perf_event *event, int flags)
> +{
> +	h_gpci_event_stop(event, flags);
> +}

Can just hook del directly no?

> +static void h_gpci_event_read(struct perf_event *event)
> +{
> +	h_gpci_event_update(event);
> +}

Ditto.

> +static int h_gpci_event_init(struct perf_event *event)
> +{
> +	u64 count;
> +	u8 length;
> +
> +	/* Not our event */
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;

I don't understand why you need this?

> +	/* config2 is unused */
> +	if (event->attr.config2)
> +		return -EINVAL;

You must also check the reserved regions of config and config1.


> +	/* unsupported modes and filters */
> +	if (event->attr.exclude_user   ||
> +	    event->attr.exclude_kernel ||
> +	    event->attr.exclude_hv     ||
> +	    event->attr.exclude_idle   ||
> +	    event->attr.exclude_host   ||
> +	    event->attr.exclude_guest  ||
> +	    is_sampling_event(event)) /* no sampling */

I think you should also check sample_type.

> +		return -EINVAL;

Have you thought about inherit, pinned, exclusive?

> +
> +	/* no branch sampling */
> +	if (has_branch_stack(event))
> +		return -EOPNOTSUPP;
> +
> +	length = event_get_length(event);
> +	if (length < 1 || length > 8)
> +		return -EINVAL;
> +
> +	/* last byte within the buffer? */
> +	if ((event_get_offset(event) + length) > sizeof(union h_gpci_cvs))
> +		return -EINVAL;
> +
> +	/* check if the request works... */
> +	if (single_gpci_request(event_get_request(event),
> +				event_get_starting_index(event),
> +				event_get_secondary_index(event),
> +				event_get_counter_info_version(event),
> +				event_get_offset(event),
> +				length,
> +				&count))
> +		return -EINVAL;
> +
> +	/*
> +	 * Some of the events are per-cpu, some per-core, some per-chip, some
> +	 * are global, and some access data from other virtual machines on the
> +	 * same physical machine. We can't map the cpu value without a lot of
> +	 * work. Instead, we pick an arbitrary cpu for all events on this pmu.
> +	 */
> +	event->cpu = 0;

OK, but is having them all on cpu zero a good idea?

> +	perf_swevent_init_hrtimer(event);
> +	return 0;
> +}
> +
> +struct pmu h_gpci_pmu = {
> +	.task_ctx_nr = perf_invalid_context,
> +
> +	.name = "hv_gpci",
> +	.attr_groups = attr_groups,
> +	.event_init = h_gpci_event_init,
> +	.add = h_gpci_event_add,
> +	.del = h_gpci_event_del,
> +	.start = h_gpci_event_start,
> +	.stop = h_gpci_event_stop,
> +	.read = h_gpci_event_read,

Nice to have them align vertically.

> +	.event_idx = perf_swevent_event_idx,
> +};
> +
> +static int hv_gpci_init(void)
> +{
> +	int r;
> +
> +	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
> +		pr_info("Not running under phyp, not supported\n");

If only it was that simple :)

You'll see FW_FEATURE_LPAR in a KVM guest too.

There are at least two mechanisms for FW to indicate the presence of features,
"ibm,hypertas-functions" and "ibm,architecture-vec-5".

If HGPCI is not exposed in either of those then we'd want to do a probe hcall
here to try and detect it at runtime.


> +		return -ENODEV;
> +	}
> +
> +	r = perf_pmu_register(&h_gpci_pmu, h_gpci_pmu.name, -1);
> +	if (r)
> +		return r;
> +
> +	return 0;
> +}
> +
> +module_init(hv_gpci_init);

This is not modular code so you're discouraged from using module_init(),
arch_initcall() would probably be fine.

cheers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] powerpc: add hv_gpci interface header
  2014-01-16 23:53 ` [PATCH 4/8] powerpc: add hv_gpci interface header Cody P Schafer
@ 2014-02-01  5:58   ` Michael Ellerman
  2014-02-03 21:40     ` Cody P Schafer
  2014-02-05 23:14     ` Cody P Schafer
  0 siblings, 2 replies; 24+ messages in thread
From: Michael Ellerman @ 2014-02-01  5:58 UTC (permalink / raw)
  To: Cody P Schafer, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo, Cody P Schafer

On Thu, 2014-16-01 at 23:53:50 UTC, Cody P Schafer wrote:
> "H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
> here on) is an interface to retrieve specific performance counters and
> other data from the hypervisor. All outputs have a fixed format (and
> are represented as structs in this patch).

So how much of this are we actually using? A lot of these seem to be only used
in the union at the bottom of this file, and not touched elsewhere - or am I
missing something subtle?

Some of it doesn't seem to be used at all?
 
> diff --git a/arch/powerpc/include/asm/hv_gpci.h b/arch/powerpc/include/asm/hv_gpci.h

Any reason this can't just live in arch/powerpc/perf ?

> +++ b/arch/powerpc/include/asm/hv_gpci.h
> @@ -0,0 +1,490 @@
> +#ifndef LINUX_POWERPC_UAPI_HV_GPCI_H_
> +#define LINUX_POWERPC_UAPI_HV_GPCI_H_
> +
> +#include <linux/types.h>
> +
> +/* From the document "H_GetPerformanceCounterInfo Interface" v1.06, paritialy
> + * updated with v1.07 */

Is that public?

> +
> +/* H_GET_PERF_COUNTER_INFO argument */
> +struct hv_get_perf_counter_info_params {
> +	__be32 counter_request; /* I */
> +	__be32 starting_index;  /* IO */
> +	__be16 secondary_index; /* IO */
> +	__be16 returned_values; /* O */
> +	__be32 detail_rc; /* O, "only for 32bit clients" */
> +
> +	/*
> +	 * O, size each of counter_value element in bytes, only set for version
> +	 * >= 0x3
> +	 */
> +	__be16 cv_element_size;
> +
> +	/* I, funny if version < 0x3 */

Funny how? Or better still, do we only support operating on some minimum
sane version of the API?

> +	__u8 counter_info_version_in;
> +
> +	/* O, funny if version < 0x3 */
> +	__u8 counter_info_version_out;
> +	__u8 reserved[0xC];
> +	__u8 counter_value[];
> +} __packed;
> +
> +/* 8 => power8 (1.07)
> + * 6 => TLBIE  (1.07)
> + * 5 => (1.05)
> + * 4 => ?
> + * 3 => ?
> + * 2 => v7r7m0.phyp (?)
> + * 1 => v7r6m0.phyp (?)
> + * 0 => v7r{2,3,4}m0.phyp (?)
> + */

I think this is a mapping of version numbers to firmware releases, it should
say so.

> +#define COUNTER_INFO_VERSION_CURRENT 0x8
> +
> +/* these determine the counter_value[] layout and the meaning of starting_index
> + * and secondary_index */

Needs: leading capital, full stop, block comment.

> +enum counter_info_requests {
> +
> +	/* GENERAL */
> +
> +	/* @starting_index: "starting" physical processor index or -1 for

Why '"starting"' ?

> +	 *                  current phyical processor. Data is only collected
> +	 *                  for the processors' "primary" thread.
> +	 * @secondary_index: unused

This seems to be true in all cases at least for this enum, can we drop it?

> +	 */
> +	CIR_dispatch_timebase_by_processor = 0x10,

Any reason for the weird capitialisation? You've obviously learnt the
noCamelCase rule, but this is still a bit odd :)

> +
> +	/* @starting_index: starting partition id or -1 for the current logical
> +	 *                  partition (virtual machine).
> +	 * @secondary_index: unused
> +	 */
> +	CIR_entitled_capped_uncapped_donated_idle_timebase_by_partition = 0x20,
> +
> +	/* @starting_index: starting partition id or -1 for the current logical
> +	 *                  partition (virtual machine).
> +	 * @secondary_index: unused
> +	 */
> +	CIR_run_instructions_run_cycles_by_partition = 0x30,
> +
> +	/* @starting_index: must be -1 (to refer to the current partition)
> +	 * @secondary_index: unused
> +	 */
> +	CIR_system_performance_capabilities = 0x40,
> +
> +
> +	/* Data from this should only be considered valid if
> +	 * counter_info_version >= 0x3
> +	 * @starting_index: starting hardware chip id or -1 for the current hw
> +	 *		    chip id
> +	 * @secondary_index: unused
> +	 */
> +	CIR_processor_bus_utilization_abc_links = 0x50,
> +
> +	/* Data from this should only be considered valid if
> +	 * counter_info_version >= 0x3
> +	 * @starting_index: starting hardware chip id or -1 for the current hw
> +	 *		    chip id
> +	 * @secondary_index: unused
> +	 */
> +	CIR_processor_bus_utilization_wxyz_links = 0x60,
> +
> +
> +	/* EXPANDED */

??

These are only available if you have the DLC ?

> +	/* Avaliable if counter_info_version >= 0x3
> +	 * @starting_index: starting hardware chip id or -1 for the current hw
> +	 *		    chip id
> +	 * @secondary_index: unused
> +	 */
> +	CIR_processor_bus_utilization_gx_links = 0x70,
> +
> +	/* Avaliable if counter_info_version >= 0x3
> +	 * @starting_index: starting hardware chip id or -1 for the current hw
> +	 *		    chip id
> +	 * @secondary_index: unused
> +	 */
> +	CIR_processor_bus_utilization_mc_links = 0x80,
> +
> +	/* Avaliable if counter_info_version >= 0x3
> +	 * @starting_index: starting physical processor or -1 for the current
> +	 *                  physical processor
> +	 * @secondary_index: unused
> +	 */
> +	CIR_processor_config = 0x90,
> +
> +	/* Avaliable if counter_info_version >= 0x3
> +	 * @starting_index: starting physical processor or -1 for the current
> +	 *                  physical processor
> +	 * @secondary_index: unused
> +	 */
> +	CIR_current_processor_frequency = 0x91,
> +
> +	CIR_processor_core_utilization = 0x94,
> +
> +	CIR_processor_core_power_mode = 0x95,
> +
> +	CIR_affinity_domain_information_by_virutal_processor = 0xA0,
> +
> +	CIR_affinity_domain_info_by_domain = 0xB0,
> +
> +	CIR_affinity_domain_info_by_partition = 0xB1,
> +
> +	/* @starting_index: unused
> +	 * @secondary_index: unused
> +	 */
> +	CIR_physical_memory_info = 0xC0,
> +
> +	CIR_processor_bus_topology = 0xD0,
> +
> +	CIR_partition_hypervisor_queuing_times = 0xE0,
> +
> +	CIR_system_hypervisor_times = 0xF0,
> +
> +	/* LAB */
> +
> +	CIR_set_mmcrh = 0x80001000,
> +	CIR_get_hpmcx = 0x80002000,
> +};


cheers

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 6/8] powerpc/perf: add support for the hv gpci (get performance counter info) interface
  2014-02-01  5:58   ` Michael Ellerman
@ 2014-02-03 21:13     ` Cody P Schafer
  0 siblings, 0 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-02-03 21:13 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo

On 01/31/2014 09:58 PM, Michael Ellerman wrote:
> On Thu, 2014-16-01 at 23:53:52 UTC, Cody P Schafer wrote:
>> This provides a basic link between perf and hv_gpci. Notably, it does
>> not yet support transactions and does not list any events (they can
>> still be manually composed).
>
> What are the plans for listing?

I'm looking at extending the sysfs api for listing perf events. We can't 
use the existing one as it doesn't let us parametrize the events by 
anything, meaning we'd need to list an essentially duplicate event for 
each cpu/core/chip and lpar (guest). The duplication in cpu/core/chip 
comes from not using the typical cpu parameter to perf_event_open() 
(which we don't do because it wouldn't make sense when the guest 
monitored isn't us).

> The manual compose is nice but pretty hairy to use in practice I would think.
>
>> diff --git a/arch/powerpc/perf/hv-gpci.c b/arch/powerpc/perf/hv-gpci.c
>> new file mode 100644
>> index 0000000..31d9d59
>> --- /dev/null
>> +++ b/arch/powerpc/perf/hv-gpci.c
>> @@ -0,0 +1,235 @@
>> +/*
>> + * Hypervisor supplied "gpci" ("get performance counter info") performance
>> + * counter support
>> + *
>> + * Author: Cody P Schafer <cody@linux.vnet.ibm.com>
>> + * Copyright 2014 IBM Corporation.
>> + *
>> + * This program is free software; you can redistribute it and/or
>> + * modify it under the terms of the GNU General Public License
>> + * as published by the Free Software Foundation; either version
>> + * 2 of the License, or (at your option) any later version.
>> + */
>> +#define pr_fmt(fmt) "hv-gpci: " fmt
>> +
>> +#include <linux/module.h>
>> +#include <linux/perf_event.h>
>> +#include <asm/firmware.h>
>> +#include <asm/hvcall.h>
>> +#include <asm/hv_gpci.h>
>> +#include <asm/io.h>
>
> Needed?
>

asm/io.h is for virt_to_phys(). And yes, I need it.

>> +/* See arch/powerpc/include/asm/hv_gpci.h for details on the hcall interface */
>> +
>> +PMU_RANGE_ATTR(request, config, 0, 31); /* u32 */
>> +PMU_RANGE_ATTR(starting_index, config, 32, 63); /* u32 */
>> +PMU_RANGE_ATTR(secondary_index, config1, 0, 15); /* u16 */
>> +PMU_RANGE_ATTR(counter_info_version, config1, 16, 23); /* u8 */
>> +PMU_RANGE_ATTR(length, config1, 24, 31); /* u8, bytes of data (1-8) */
>> +PMU_RANGE_ATTR(offset, config1, 32, 63); /* u32, byte offset */
>> +
>> +static struct attribute *format_attr[] = {
>> +	&format_attr_request.attr,
>> +	&format_attr_starting_index.attr,
>> +	&format_attr_secondary_index.attr,
>> +	&format_attr_counter_info_version.attr,
>> +
> Lonley blank line.

Which was seperating "real" attributes from those provided by the kernel 
(gpci doesn't know about "offset" and "length"). I'll remove.

>> +	&format_attr_offset.attr,
>> +	&format_attr_length.attr,
>> +	NULL,
>> +};
>> +
>> +static struct attribute_group format_group = {
>> +	.name = "format",
>> +	.attrs = format_attr,
>> +};
>> +
>> +static const struct attribute_group *attr_groups[] = {
>> +	&format_group,
>> +	NULL,
>> +};
>> +
>> +static unsigned long single_gpci_request(u32 req, u32 starting_index,
>> +		u16 secondary_index, u8 version_in, u32 offset, u8 length,
>> +		u64 *value)
>
> Passing the event and extracting the values in here would be neater IMHO.
>

Well, I'll at least add a wrapper that does that. The idea here was to 
separate the perf specific logic from the gpci specific logic. And I'll 
end up taking advantage of that separation when doing the interface 
probing (mentioned way down near the end of this email).

>> +{
>> +	unsigned long ret;
>> +	size_t i;
>> +	u64 count;
>> +
>> +	struct {
>> +		struct hv_get_perf_counter_info_params params;
>> +		union {
>> +			union h_gpci_cvs data;
>> +			uint8_t bytes[sizeof(union h_gpci_cvs)];
>> +		};
>> +	} arg = {
>> +		.params = {
>> +			.counter_request = cpu_to_be32(req),
>> +			.starting_index = cpu_to_be32(starting_index),
>> +			.secondary_index = cpu_to_be16(secondary_index),
>> +			.counter_info_version_in = version_in,
>> +		}
>> +	};
>> +
>> +	ret = plpar_hcall_norets(H_GET_PERF_COUNTER_INFO,
>> +			virt_to_phys(&arg), sizeof(arg));
>> +	if (ret) {
>> +		pr_devel("hcall failed: 0x%lx\n", ret);
>> +		return ret;
>> +	}
>> +
>> +	/*
>> +	 * we verify offset and length are within the zeroed buffer at event
>> +	 * init.
>> +	 */
>> +	count = 0;
>> +	for (i = offset; i < offset + length; i++)
>> +		count |= arg.bytes[i] << (i - offset);
>> +
>> +	*value = count;
>> +	return ret;
>> +}
>> +
>> +static u64 h_gpci_get_value(struct perf_event *event)
>> +{
>> +	u64 count;
>> +	unsigned long ret = single_gpci_request(event_get_request(event),
>> +					event_get_starting_index(event),
>> +					event_get_secondary_index(event),
>> +					event_get_counter_info_version(event),
>> +					event_get_offset(event),
>> +					event_get_length(event),
>> +					&count);
>> +	if (ret)
>> +		return 0;
>> +	return count;
>> +}
>> +
>> +static void h_gpci_event_update(struct perf_event *event)
>> +{
>> +	s64 prev;
>> +	u64 now = h_gpci_get_value(event);
>> +	prev = local64_xchg(&event->hw.prev_count, now);
>> +	local64_add(now - prev, &event->count);
>> +}
>> +
>> +static void h_gpci_event_start(struct perf_event *event, int flags)
>> +{
>> +	local64_set(&event->hw.prev_count, h_gpci_get_value(event));
>> +	perf_swevent_start_hrtimer(event);
>> +}
>> +
>> +static void h_gpci_event_stop(struct perf_event *event, int flags)
>> +{
>> +	perf_swevent_cancel_hrtimer(event);
>> +	h_gpci_event_update(event);
>> +}
>> +
>> +static int h_gpci_event_add(struct perf_event *event, int flags)
>> +{
>> +	if (flags & PERF_EF_START)
>> +		h_gpci_event_start(event, flags);
>> +
>> +	return 0;
>> +}
>> +
>> +static void h_gpci_event_del(struct perf_event *event, int flags)
>> +{
>> +	h_gpci_event_stop(event, flags);
>> +}
>
> Can just hook del directly no?

Yep, good point.

>
>> +static void h_gpci_event_read(struct perf_event *event)
>> +{
>> +	h_gpci_event_update(event);
>> +}
>
> Ditto.

Yep, agreed.

>
>> +static int h_gpci_event_init(struct perf_event *event)
>> +{
>> +	u64 count;
>> +	u8 length;
>> +
>> +	/* Not our event */
>> +	if (event->attr.type != event->pmu->type)
>> +		return -ENOENT;
> I don't understand why you need this?

It's part of the standard perf stuff. In perf_init_event() we do an idr 
lookup to identify the responsible PMU, and it that fails the PMUs get 
iterated over and need to return -ENOENT for events they don't own. That 
said, I have no idea what cases the idr lookup would fail.

>> +	/* config2 is unused */
>> +	if (event->attr.config2)
>> +		return -EINVAL;
>
> You must also check the reserved regions of config and config1.

There aren't any (I use all the bits in config and config1).

>> +	/* unsupported modes and filters */
>> +	if (event->attr.exclude_user   ||
>> +	    event->attr.exclude_kernel ||
>> +	    event->attr.exclude_hv     ||
>> +	    event->attr.exclude_idle   ||
>> +	    event->attr.exclude_host   ||
>> +	    event->attr.exclude_guest  ||
>> +	    is_sampling_event(event)) /* no sampling */
>
> I think you should also check sample_type.

Why? Many PMUs don't, I'm not seeing what the need is here.

>> +		return -EINVAL;
>
> Have you thought about inherit, pinned, exclusive?
>

exclusive and pinned don't make any sense because we don't have any 
actual PMU hw that is being consumed. The event counters are always 
counting.
inherit isn't relevant because we forbid all task tracking (via "+ 
.task_ctx_nr = perf_invalid_context," below).

I haven't forbidden either of these because I'm not sure there is a 
point in doing so (the exclude_* stuff is used for some permissions 
checking)

>> +
>> +	/* no branch sampling */
>> +	if (has_branch_stack(event))
>> +		return -EOPNOTSUPP;
>> +
>> +	length = event_get_length(event);
>> +	if (length < 1 || length > 8)
>> +		return -EINVAL;
>> +
>> +	/* last byte within the buffer? */
>> +	if ((event_get_offset(event) + length) > sizeof(union h_gpci_cvs))
>> +		return -EINVAL;
>> +
>> +	/* check if the request works... */
>> +	if (single_gpci_request(event_get_request(event),
>> +				event_get_starting_index(event),
>> +				event_get_secondary_index(event),
>> +				event_get_counter_info_version(event),
>> +				event_get_offset(event),
>> +				length,
>> +				&count))
>> +		return -EINVAL;
>> +
>> +	/*
>> +	 * Some of the events are per-cpu, some per-core, some per-chip, some
>> +	 * are global, and some access data from other virtual machines on the
>> +	 * same physical machine. We can't map the cpu value without a lot of
>> +	 * work. Instead, we pick an arbitrary cpu for all events on this pmu.
>> +	 */
>> +	event->cpu = 0;
>
> OK, but is having them all on cpu zero a good idea?

Probably not. We can remove this line without any real effect on the 
event output. Ideally, we'd have a way to tell perf that our PMU isn't 
associated with _any_ cpu (which would also eliminate some cross-cpu 
calls which trigger IPIs), and that is something I'm looking at, but 
perf widely assumes that events are tied to cpus (or some cpu mappable 
unit like a core or socket).

>
>> +	perf_swevent_init_hrtimer(event);
>> +	return 0;
>> +}
>> +
>> +struct pmu h_gpci_pmu = {
>> +	.task_ctx_nr = perf_invalid_context,
>> +
>> +	.name = "hv_gpci",
>> +	.attr_groups = attr_groups,
>> +	.event_init = h_gpci_event_init,
>> +	.add = h_gpci_event_add,
>> +	.del = h_gpci_event_del,
>> +	.start = h_gpci_event_start,
>> +	.stop = h_gpci_event_stop,
>> +	.read = h_gpci_event_read,
>
> Nice to have them align vertically.
Ack.

>
>> +	.event_idx = perf_swevent_event_idx,
>> +};
>> +
>> +static int hv_gpci_init(void)
>> +{
>> +	int r;
>> +
>> +	if (!firmware_has_feature(FW_FEATURE_LPAR)) {
>> +		pr_info("Not running under phyp, not supported\n");
>
> If only it was that simple :)
>
> You'll see FW_FEATURE_LPAR in a KVM guest too.
>
> There are at least two mechanisms for FW to indicate the presence of features,
> "ibm,hypertas-functions" and "ibm,architecture-vec-5".

I don't _think_ it is exposed in either of those, or if it is the docs 
don't say so :(.

> If HGPCI is not exposed in either of those then we'd want to do a probe hcall
> here to try and detect it at runtime.

Yep, having a probing hcall would be the way to go. The only issue there 
is handling migration between firmware which has or doesn't have support 
for this hcall. I suppose hooking a migration notifier of some kind 
would let us get past that bump.

>> +		return -ENODEV;
>> +	}
>> +
>> +	r = perf_pmu_register(&h_gpci_pmu, h_gpci_pmu.name, -1);
>> +	if (r)
>> +		return r;
>> +
>> +	return 0;
>> +}
>> +
>> +module_init(hv_gpci_init);
>
> This is not modular code so you're discouraged from using module_init(),
> arch_initcall() would probably be fine.
>

Got it.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus
  2014-02-01  5:58   ` Michael Ellerman
@ 2014-02-03 21:19     ` Cody P Schafer
  0 siblings, 0 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-02-03 21:19 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo

On 01/31/2014 09:58 PM, Michael Ellerman wrote:
> On Thu, 2014-16-01 at 23:53:47 UTC, Cody P Schafer wrote:
>> Add PMU_RANGE_ATTR() and PMU_RANGE_RESV() (for reserved areas) which
>> generate functions to extract the relevent bits from
>> event->attr.config{,1,2} for use by sw-like pmus where the
>> 'config{,1,2}' values don't map directly to hardware registers.
>
> This is neat.
>
> The split of the macros is a bit weird, ie. PMU_RANGE_RESV() doesn't really do
> what it's name suggests.
>
> I think you want one macro which creates the accessors, with a name that
> reflects that - yeah I can't think of a good one right now, but "event" should
> probably be in there because that's what it operates on.
>
> Having a macro for the reserved regions is good, but you MUST actually check
> that the reserved regions are zero. Otherwise you are permitting your caller to
> pass junk in there and you then can't unreserved them in a future version of
> the API.
>
> So I think a macro that gives you a special reserved region routine would be
> good, so you can write something like:
>
>    if (event_check_reserved1() || event_check_reserved2())
>    	return -EINVAL;
>

The way it's set up right now, RESV is just a hint to the user of the 
PMU_RANGE_ATTR() and PMU_RANGE_RESV() macros to indicate which to use. 
RESV simply avoids creating an attr format which would go unused only in 
the case where the range is a reserved one (and gcc would complain about 
it).

I don't like the "event_check_foo()" bit because that is actually 
identical to "event_get_foo()", I don't see a point in generating 
differently named functions that do exactly the same thing.

The current user (hv-24x7.c) of PMU_RANGE_RESV() already does the 
appropriate checking:

	if (event_get_reserved1(event) ||
	    event_get_reserved2(event) ||
	    event_get_reserved3(event)) {
		pr_devel("reserved set when forbidden 0x%llx(0x%llx) 0x%llx(0x%llx) 
0x%llx(0x%llx)\n",
				event->attr.config,
				event_get_reserved1(event),
				event->attr.config1,
				event_get_reserved2(event),
				event->attr.config2,
				event_get_reserved3(event));
		return -EINVAL;
	}


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 3/8] powerpc: add hvcalls for 24x7 and gpci (get performance counter info)
  2014-02-01  5:58   ` Michael Ellerman
@ 2014-02-03 21:21     ` Cody P Schafer
  0 siblings, 0 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-02-03 21:21 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo

On 01/31/2014 09:58 PM, Michael Ellerman wrote:
> On Thu, 2014-16-01 at 23:53:49 UTC, Cody P Schafer wrote:
>> Signed-off-by: Cody P Schafer <cody@linux.vnet.ibm.com>
>> ---
>>   arch/powerpc/include/asm/hvcall.h | 6 +++++-
>>   1 file changed, 5 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/powerpc/include/asm/hvcall.h b/arch/powerpc/include/asm/hvcall.h
>> index d8b600b..48d6efa 100644
>> --- a/arch/powerpc/include/asm/hvcall.h
>> +++ b/arch/powerpc/include/asm/hvcall.h
>> @@ -269,11 +269,15 @@
>>   #define H_COP			0x304
>>   #define H_GET_MPP_X		0x314
>>   #define H_SET_MODE		0x31C
>> -#define MAX_HCALL_OPCODE	H_SET_MODE
>> +#define H_GET_24X7_CATALOG_PAGE 0xF078
>> +#define H_GET_24X7_DATA		0xF07C
>> +#define H_GET_PERF_COUNTER_INFO 0xF080
>
> Ugh, why the hell did they put them up there.
>
>> +#define MAX_HCALL_OPCODE	H_GET_PERF_COUNTER_INFO
>
> We have an array which is sized based on this, which is unpleasant.
>
> I think you're better off putting these below in the platform specific section,
> and leaving MAX_HCALL_OPCODE alone. The only downside is you can't use the
> hcall tracing to see them.

Ya, I'm aware. I've got them up there as I did want to trace them :) . I 
don't see a big issue with moving them out of that section, though.

>>   /* Platform specific hcalls, used by KVM */
>>   #define H_RTAS			0xf000
>



^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] powerpc: add hv_gpci interface header
  2014-02-01  5:58   ` Michael Ellerman
@ 2014-02-03 21:40     ` Cody P Schafer
  2014-02-05 23:14     ` Cody P Schafer
  1 sibling, 0 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-02-03 21:40 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo

On 01/31/2014 09:58 PM, Michael Ellerman wrote:
> On Thu, 2014-16-01 at 23:53:50 UTC, Cody P Schafer wrote:
>> "H_GetPerformanceCounterInfo" (refered to as hv_gpci or just gpci from
>> here on) is an interface to retrieve specific performance counters and
>> other data from the hypervisor. All outputs have a fixed format (and
>> are represented as structs in this patch).
>
> So how much of this are we actually using? A lot of these seem to be only used
> in the union at the bottom of this file, and not touched elsewhere - or am I
> missing something subtle?
>
> Some of it doesn't seem to be used at all?

We're using very little of the actual interface. In 24x7 we use 
cv_system_performance_capabilities, but other than that all the cv_* 
structures go unused.

>
>> diff --git a/arch/powerpc/include/asm/hv_gpci.h b/arch/powerpc/include/asm/hv_gpci.h
>
> Any reason this can't just live in arch/powerpc/perf ?
>
>> +++ b/arch/powerpc/include/asm/hv_gpci.h
>> @@ -0,0 +1,490 @@
>> +#ifndef LINUX_POWERPC_UAPI_HV_GPCI_H_
>> +#define LINUX_POWERPC_UAPI_HV_GPCI_H_
>> +
>> +#include <linux/types.h>
>> +
>> +/* From the document "H_GetPerformanceCounterInfo Interface" v1.06, paritialy
>> + * updated with v1.07 */
>
> Is that public?

Nope.

>> +
>> +/* H_GET_PERF_COUNTER_INFO argument */
>> +struct hv_get_perf_counter_info_params {
>> +	__be32 counter_request; /* I */
>> +	__be32 starting_index;  /* IO */
>> +	__be16 secondary_index; /* IO */
>> +	__be16 returned_values; /* O */
>> +	__be32 detail_rc; /* O, "only for 32bit clients" */
>> +
>> +	/*
>> +	 * O, size each of counter_value element in bytes, only set for version
>> +	 * >= 0x3
>> +	 */
>> +	__be16 cv_element_size;
>> +
>> +	/* I, funny if version < 0x3 */
>
> Funny how? Or better still, do we only support operating on some minimum
> sane version of the API?

Right now the perf stuff is setup to let the user specify the version 
they want to operate in, and we avoid trying to provide cross-version 
compatibility.

Funny = must be set to 0x0. I'll update the comment.

>
>> +	__u8 counter_info_version_in;
>> +
>> +	/* O, funny if version < 0x3 */
>> +	__u8 counter_info_version_out;
>> +	__u8 reserved[0xC];
>> +	__u8 counter_value[];
>> +} __packed;
>> +
>> +/* 8 => power8 (1.07)
>> + * 6 => TLBIE  (1.07)
>> + * 5 => (1.05)
>> + * 4 => ?
>> + * 3 => ?
>> + * 2 => v7r7m0.phyp (?)
>> + * 1 => v7r6m0.phyp (?)
>> + * 0 => v7r{2,3,4}m0.phyp (?)
>> + */
>
> I think this is a mapping of version numbers to firmware releases, it should
> say so.

It is, I'll note it.

>
>> +#define COUNTER_INFO_VERSION_CURRENT 0x8
>> +
>> +/* these determine the counter_value[] layout and the meaning of starting_index
>> + * and secondary_index */
>
> Needs: leading capital, full stop, block comment.

ack

>
>> +enum counter_info_requests {
>> +
>> +	/* GENERAL */
>> +
>> +	/* @starting_index: "starting" physical processor index or -1 for
>
> Why '"starting"' ?

The requests are designed to return a sequence of data blocks. For 
example, in this case where the index refers to a physical processor id, 
one can request phys processor 0,1,2,3 in a single hcall (as long as the 
buffer is sized appropriately. We don't take advantage of this.

>
>> +	 *                  current phyical processor. Data is only collected
>> +	 *                  for the processors' "primary" thread.
>> +	 * @secondary_index: unused
>
> This seems to be true in all cases at least for this enum, can we drop it?
>

CIR_affinity_domain_information_by_virutal_processor uses 
secondary_index. That said, I'll note that if not mentioned, 
secondary_index is unused and use that to cut out some of the duplication.


>> +	 */
>> +	CIR_dispatch_timebase_by_processor = 0x10,
>
> Any reason for the weird capitialisation? You've obviously learnt the
> noCamelCase rule, but this is still a bit odd :)
>

Mainly because these end up rather long and I was getting tired of 
fiddling with caps lock (and this weird capitalization lets macros do 
fun things without having to specify a long name twice, not that I'm 
doing that right now). Also, the spec gives them as 
"Dispatch_timebase_by_processor" (not that this really matters).

I'll properly capitalize them all if that's preferred (I expect it is).

>> +
>> +	/* @starting_index: starting partition id or -1 for the current logical
>> +	 *                  partition (virtual machine).
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_entitled_capped_uncapped_donated_idle_timebase_by_partition = 0x20,
>> +
>> +	/* @starting_index: starting partition id or -1 for the current logical
>> +	 *                  partition (virtual machine).
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_run_instructions_run_cycles_by_partition = 0x30,
>> +
>> +	/* @starting_index: must be -1 (to refer to the current partition)
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_system_performance_capabilities = 0x40,
>> +
>> +
>> +	/* Data from this should only be considered valid if
>> +	 * counter_info_version >= 0x3
>> +	 * @starting_index: starting hardware chip id or -1 for the current hw
>> +	 *		    chip id
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_processor_bus_utilization_abc_links = 0x50,
>> +
>> +	/* Data from this should only be considered valid if
>> +	 * counter_info_version >= 0x3
>> +	 * @starting_index: starting hardware chip id or -1 for the current hw
>> +	 *		    chip id
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_processor_bus_utilization_wxyz_links = 0x60,
>> +
>> +
>> +	/* EXPANDED */
>
> ??
>
> These are only available if you have the DLC ?

There is a bit set by the hv that lets us get at them. 
"system_performance_capabilities" lets us get that bit. Unfortunately, 
we can't just ignore it (the hcall gives us an access error if they 
aren't enabled). And I'm not sure what the mechanism will be for 
enabling them on end user boxes. So yes, probably DLC.

>> +	/* Avaliable if counter_info_version >= 0x3
>> +	 * @starting_index: starting hardware chip id or -1 for the current hw
>> +	 *		    chip id
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_processor_bus_utilization_gx_links = 0x70,
>> +
>> +	/* Avaliable if counter_info_version >= 0x3
>> +	 * @starting_index: starting hardware chip id or -1 for the current hw
>> +	 *		    chip id
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_processor_bus_utilization_mc_links = 0x80,
>> +
>> +	/* Avaliable if counter_info_version >= 0x3
>> +	 * @starting_index: starting physical processor or -1 for the current
>> +	 *                  physical processor
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_processor_config = 0x90,
>> +
>> +	/* Avaliable if counter_info_version >= 0x3
>> +	 * @starting_index: starting physical processor or -1 for the current
>> +	 *                  physical processor
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_current_processor_frequency = 0x91,
>> +
>> +	CIR_processor_core_utilization = 0x94,
>> +
>> +	CIR_processor_core_power_mode = 0x95,
>> +
>> +	CIR_affinity_domain_information_by_virutal_processor = 0xA0,
>> +
>> +	CIR_affinity_domain_info_by_domain = 0xB0,
>> +
>> +	CIR_affinity_domain_info_by_partition = 0xB1,
>> +
>> +	/* @starting_index: unused
>> +	 * @secondary_index: unused
>> +	 */
>> +	CIR_physical_memory_info = 0xC0,
>> +
>> +	CIR_processor_bus_topology = 0xD0,
>> +
>> +	CIR_partition_hypervisor_queuing_times = 0xE0,
>> +
>> +	CIR_system_hypervisor_times = 0xF0,
>> +
>> +	/* LAB */
>> +
>> +	CIR_set_mmcrh = 0x80001000,
>> +	CIR_get_hpmcx = 0x80002000,
>> +};
>
>
> cheers
>


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 4/8] powerpc: add hv_gpci interface header
  2014-02-01  5:58   ` Michael Ellerman
  2014-02-03 21:40     ` Cody P Schafer
@ 2014-02-05 23:14     ` Cody P Schafer
  1 sibling, 0 replies; 24+ messages in thread
From: Cody P Schafer @ 2014-02-05 23:14 UTC (permalink / raw)
  To: Michael Ellerman, Linux PPC
  Cc: Peter Zijlstra, LKML, Ingo Molnar, Paul Mackerras,
	Arnaldo Carvalho de Melo


>> diff --git a/arch/powerpc/include/asm/hv_gpci.h b/arch/powerpc/include/asm/hv_gpci.h
>
> Any reason this can't just live in arch/powerpc/perf ?
>

Nope, it should be able to keep the header there for now. As this 
interface allows determination of the HW topology, we may have some code 
that exposes that (in sysfs) at some point, which doesn't really belong 
in arch/powerpc/perf (though we could just put it there anyhow).


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2014-02-05 23:14 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-01-16 23:53 [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Cody P Schafer
2014-01-16 23:53 ` [PATCH 1/8] perf: add PMU_RANGE_ATTR() helper for use by sw-like pmus Cody P Schafer
2014-02-01  5:58   ` Michael Ellerman
2014-02-03 21:19     ` Cody P Schafer
2014-01-16 23:53 ` [PATCH 2/8] perf core: export swevent hrtimer helpers Cody P Schafer
2014-02-01  5:58   ` Michael Ellerman
2014-01-16 23:53 ` [PATCH 3/8] powerpc: add hvcalls for 24x7 and gpci (get performance counter info) Cody P Schafer
2014-02-01  5:58   ` Michael Ellerman
2014-02-03 21:21     ` Cody P Schafer
2014-01-16 23:53 ` [PATCH 4/8] powerpc: add hv_gpci interface header Cody P Schafer
2014-02-01  5:58   ` Michael Ellerman
2014-02-03 21:40     ` Cody P Schafer
2014-02-05 23:14     ` Cody P Schafer
2014-01-16 23:53 ` [PATCH 5/8] powerpc: add 24x7 " Cody P Schafer
2014-02-01  5:58   ` Michael Ellerman
2014-01-16 23:53 ` [PATCH 6/8] powerpc/perf: add support for the hv gpci (get performance counter info) interface Cody P Schafer
2014-02-01  5:58   ` Michael Ellerman
2014-02-03 21:13     ` Cody P Schafer
2014-01-16 23:53 ` [PATCH 7/8] powerpc/perf: add support for the hv 24x7 interface Cody P Schafer
2014-01-16 23:53 ` [PATCH 8/8] powerpc/perf: add kconfig option for hypervisor provided counters Cody P Schafer
2014-01-22  1:32 ` [PATCH 0/8] Add support for PowerPC Hypervisor supplied performance counters Michael Ellerman
2014-01-22  9:37   ` Anshuman Khandual
2014-01-23  0:11   ` Cody P Schafer
2014-01-31 20:59     ` Cody P Schafer

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).