All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-17 19:43 ` suravee.suthikulpanit-5C7GfCeVMHo
  0 siblings, 0 replies; 30+ messages in thread
From: suravee.suthikulpanit @ 2013-05-17 19:43 UTC (permalink / raw)
  To: linux-kernel, a.p.zijlstra, mingo, iommu, joro; +Cc: Suravee Suthikulpanit

From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>

These patches implement the AMD IOMMU Performance Counter functionality
via custom perf PMU and implement static counting for various IOMMU
translations.

    1) Extend the AMD IOMMU initialization to include performance
       counter enablement.

    2) The perf AMD IOMMU PMU to manage performance counters, which
       interface with the AMD IOMMU core driver.

Steven L Kinney (1):
  perf/x86/amd: Adding IOMMU PC resource management

Suravee Suthikulpanit (1):
  perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation

 arch/x86/kernel/cpu/Makefile               |    2 +-
 arch/x86/kernel/cpu/perf_event_amd_iommu.c |  500 ++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_amd_iommu.h |   40 +++
 drivers/iommu/amd_iommu_init.c             |  121 ++++++-
 drivers/iommu/amd_iommu_proto.h            |    7 +
 drivers/iommu/amd_iommu_types.h            |   12 +-
 6 files changed, 675 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h

-- 
1.7.10.4



^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-17 19:43 ` suravee.suthikulpanit-5C7GfCeVMHo
  0 siblings, 0 replies; 30+ messages in thread
From: suravee.suthikulpanit-5C7GfCeVMHo @ 2013-05-17 19:43 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	joro-zLv9SwRftAIdnm+yROfE0A

From: Suravee Suthikulpanit <suravee.suthikulpanit-5C7GfCeVMHo@public.gmane.org>

These patches implement the AMD IOMMU Performance Counter functionality
via custom perf PMU and implement static counting for various IOMMU
translations.

    1) Extend the AMD IOMMU initialization to include performance
       counter enablement.

    2) The perf AMD IOMMU PMU to manage performance counters, which
       interface with the AMD IOMMU core driver.

Steven L Kinney (1):
  perf/x86/amd: Adding IOMMU PC resource management

Suravee Suthikulpanit (1):
  perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation

 arch/x86/kernel/cpu/Makefile               |    2 +-
 arch/x86/kernel/cpu/perf_event_amd_iommu.c |  500 ++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_amd_iommu.h |   40 +++
 drivers/iommu/amd_iommu_init.c             |  121 ++++++-
 drivers/iommu/amd_iommu_proto.h            |    7 +
 drivers/iommu/amd_iommu_types.h            |   12 +-
 6 files changed, 675 insertions(+), 7 deletions(-)
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h

-- 
1.7.10.4

^ permalink raw reply	[flat|nested] 30+ messages in thread

* [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management
@ 2013-05-17 19:43   ` suravee.suthikulpanit-5C7GfCeVMHo
  0 siblings, 0 replies; 30+ messages in thread
From: suravee.suthikulpanit @ 2013-05-17 19:43 UTC (permalink / raw)
  To: linux-kernel, a.p.zijlstra, mingo, iommu, joro
  Cc: Steven L Kinney, Steven Kinney, Suravee Suthikulpanit

From: Steven L Kinney <Steven.Kinney@amd.com>

Add functionality to check the availability of the AMD IOMMU Performance
Counters and export this functionality to other core drivers, such as in this
case, a perf AMD IOMMU PMU.  This feature is not bound to any specific AMD
family/model other than the presence of the IOMMU with PC enabled.

The AMD IOMMU PC support static counting only at this time.

Signed-off-by: Steven Kinney <steven.kinney@amd.com>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
V2 Changes:
	- Modify the amd_iommu_pc_get_set_reg_val function 
	to support 64 bit values. 
	- Fix logic to properly clear amd_iommu_pc_present when
	initialization failed.

 drivers/iommu/amd_iommu_init.c  |  121 ++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/amd_iommu_proto.h |    7 +++
 drivers/iommu/amd_iommu_types.h |   12 +++-
 3 files changed, 134 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index bf51abb..395fa29 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
 u32 amd_iommu_max_pasids __read_mostly = ~0;
 
 bool amd_iommu_v2_present __read_mostly;
+bool amd_iommu_pc_present __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
@@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
  */
 static u8 __iomem * __init iommu_map_mmio_space(u64 address)
 {
-	if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
+	if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {
 		pr_err("AMD-Vi: Can not reserve memory region %llx for mmio\n",
 			address);
 		pr_err("AMD-Vi: This is a BIOS bug. Please contact your hardware vendor\n");
 		return NULL;
 	}
 
-	return (u8 __iomem *)ioremap_nocache(address, MMIO_REGION_LENGTH);
+	return (u8 __iomem *)ioremap_nocache(address, MMIO_REG_END_OFFSET);
 }
 
 static void __init iommu_unmap_mmio_space(struct amd_iommu *iommu)
 {
 	if (iommu->mmio_base)
 		iounmap(iommu->mmio_base);
-	release_mem_region(iommu->mmio_phys, MMIO_REGION_LENGTH);
+	release_mem_region(iommu->mmio_phys, MMIO_REG_END_OFFSET);
 }
 
 /****************************************************************************
@@ -1160,6 +1161,33 @@ static int __init init_iommu_all(struct acpi_table_header *table)
 	return 0;
 }
 
+
+static void init_iommu_perf_ctr(struct amd_iommu *iommu)
+{
+	u64 val = 0xabcd, val2 = 0;
+
+	if (!iommu_feature(iommu, FEATURE_PC))
+		return;
+
+	amd_iommu_pc_present = true;
+
+	/* Check if the performance counters can be written to */
+	if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
+	    (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
+	    (val != val2)) {
+		pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
+		amd_iommu_pc_present = false;
+		return;
+	}
+
+	pr_info("AMD-Vi: IOMMU performance counters " "supported\n");
+
+	val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
+	iommu->max_banks = (u8) ((val >> 12) & 0x3f);
+	iommu->max_counters = (u8) ((val >> 7) & 0xf);
+}
+
+
 static int iommu_init_pci(struct amd_iommu *iommu)
 {
 	int cap_ptr = iommu->cap_ptr;
@@ -1226,6 +1254,8 @@ static int iommu_init_pci(struct amd_iommu *iommu)
 	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
 		amd_iommu_np_cache = true;
 
+	init_iommu_perf_ctr(iommu);
+
 	if (is_rd890_iommu(iommu->dev)) {
 		int i, j;
 
@@ -2232,3 +2262,88 @@ bool amd_iommu_v2_supported(void)
 	return amd_iommu_v2_present;
 }
 EXPORT_SYMBOL(amd_iommu_v2_supported);
+
+/****************************************************************************
+ *
+ * IOMMU EFR Performance Counter support functionality. This code allows
+ * access to the IOMMU PC functionality.
+ *
+ ****************************************************************************/
+
+u8 amd_iommu_pc_get_max_banks(u16 devid)
+{
+	struct amd_iommu *iommu;
+
+	/* locate the iommu governing the devid */
+	iommu = amd_iommu_rlookup_table[devid];
+
+	if (iommu)
+		return iommu->max_banks;
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);
+
+bool amd_iommu_pc_supported(void)
+{
+	return amd_iommu_pc_present;
+}
+EXPORT_SYMBOL(amd_iommu_pc_supported);
+
+u8 amd_iommu_pc_get_max_counters(u16 devid)
+{
+	struct amd_iommu *iommu;
+
+	/* locate the iommu governing the devid */
+	iommu = amd_iommu_rlookup_table[devid];
+
+	if (iommu)
+		return iommu->max_counters;
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);
+
+int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
+				    u64 *value, bool is_write)
+{
+	struct amd_iommu *iommu;
+	u32 offset;
+	u32 max_offset_lim;
+
+	/* Make sure the IOMMU PC resource is available */
+	if (!amd_iommu_pc_present) {
+		pr_info("AMD IOMMU - PC Not supported.\n");
+		return -ENODEV;
+	}
+
+	/* locate the iommu associated with the device ID */
+	iommu = amd_iommu_rlookup_table[devid];
+	if (iommu == NULL)
+		return -ENODEV;
+
+	/* check for valid iommu pc register indexing */
+	if (fxn < 0 || fxn > 0x28 || (fxn & 7))
+		return -ENODEV;
+
+	offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
+
+	/* limit the offset to the hw defined mmio region aperture */
+	max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
+				(iommu->max_counters << 8) | 0x28);
+	if ((offset < MMIO_CNTR_REG_OFFSET) ||
+	    (offset > max_offset_lim))
+		return -EINVAL;
+
+	if (is_write) {
+		writel((u32)*value, iommu->mmio_base + offset);
+		writel((*value >> 32), iommu->mmio_base + offset + 4);
+	} else {
+		*value = readl(iommu->mmio_base + offset + 4);
+		*value <<= 32;
+		*value = readl(iommu->mmio_base + offset);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_set_reg_val);
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index c294961..95ed6de 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -56,6 +56,13 @@ extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
 extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid);
 extern struct iommu_domain *amd_iommu_get_v2_domain(struct pci_dev *pdev);
 
+/* IOMMU Performance Counter functions */
+extern bool amd_iommu_pc_supported(void);
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+extern int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
+				    u64 *value, bool is_write);
+
 #define PPR_SUCCESS			0x0
 #define PPR_INVALID			0x1
 #define PPR_FAILURE			0xf
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 0285a21..a95f0ae 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -38,9 +38,6 @@
 #define ALIAS_TABLE_ENTRY_SIZE		2
 #define RLOOKUP_TABLE_ENTRY_SIZE	(sizeof(void *))
 
-/* Length of the MMIO region for the AMD IOMMU */
-#define MMIO_REGION_LENGTH       0x4000
-
 /* Capability offsets used by the driver */
 #define MMIO_CAP_HDR_OFFSET	0x00
 #define MMIO_RANGE_OFFSET	0x0c
@@ -78,6 +75,10 @@
 #define MMIO_STATUS_OFFSET	0x2020
 #define MMIO_PPR_HEAD_OFFSET	0x2030
 #define MMIO_PPR_TAIL_OFFSET	0x2038
+#define MMIO_CNTR_CONF_OFFSET	0x4000
+#define MMIO_CNTR_REG_OFFSET	0x40000
+#define MMIO_REG_END_OFFSET	0x80000
+
 
 
 /* Extended Feature Bits */
@@ -507,6 +508,7 @@ struct amd_iommu {
 
 	/* physical address of MMIO space */
 	u64 mmio_phys;
+
 	/* virtual address of MMIO space */
 	u8 __iomem *mmio_base;
 
@@ -584,6 +586,10 @@ struct amd_iommu {
 
 	/* The l2 indirect registers */
 	u32 stored_l2[0x83];
+
+	/* The maximum PC banks and counters/bank (PCSup=1) */
+	u8 max_banks;
+	u8 max_counters;
 };
 
 struct devid_map {
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management
@ 2013-05-17 19:43   ` suravee.suthikulpanit-5C7GfCeVMHo
  0 siblings, 0 replies; 30+ messages in thread
From: suravee.suthikulpanit-5C7GfCeVMHo @ 2013-05-17 19:43 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	joro-zLv9SwRftAIdnm+yROfE0A
  Cc: Steven Kinney

From: Steven L Kinney <Steven.Kinney-5C7GfCeVMHo@public.gmane.org>

Add functionality to check the availability of the AMD IOMMU Performance
Counters and export this functionality to other core drivers, such as in this
case, a perf AMD IOMMU PMU.  This feature is not bound to any specific AMD
family/model other than the presence of the IOMMU with PC enabled.

The AMD IOMMU PC support static counting only at this time.

Signed-off-by: Steven Kinney <steven.kinney-5C7GfCeVMHo@public.gmane.org>
Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit-5C7GfCeVMHo@public.gmane.org>
---
V2 Changes:
	- Modify the amd_iommu_pc_get_set_reg_val function 
	to support 64 bit values. 
	- Fix logic to properly clear amd_iommu_pc_present when
	initialization failed.

 drivers/iommu/amd_iommu_init.c  |  121 ++++++++++++++++++++++++++++++++++++++-
 drivers/iommu/amd_iommu_proto.h |    7 +++
 drivers/iommu/amd_iommu_types.h |   12 +++-
 3 files changed, 134 insertions(+), 6 deletions(-)

diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
index bf51abb..395fa29 100644
--- a/drivers/iommu/amd_iommu_init.c
+++ b/drivers/iommu/amd_iommu_init.c
@@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
 u32 amd_iommu_max_pasids __read_mostly = ~0;
 
 bool amd_iommu_v2_present __read_mostly;
+bool amd_iommu_pc_present __read_mostly;
 
 bool amd_iommu_force_isolation __read_mostly;
 
@@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
  */
 static u8 __iomem * __init iommu_map_mmio_space(u64 address)
 {
-	if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
+	if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {
 		pr_err("AMD-Vi: Can not reserve memory region %llx for mmio\n",
 			address);
 		pr_err("AMD-Vi: This is a BIOS bug. Please contact your hardware vendor\n");
 		return NULL;
 	}
 
-	return (u8 __iomem *)ioremap_nocache(address, MMIO_REGION_LENGTH);
+	return (u8 __iomem *)ioremap_nocache(address, MMIO_REG_END_OFFSET);
 }
 
 static void __init iommu_unmap_mmio_space(struct amd_iommu *iommu)
 {
 	if (iommu->mmio_base)
 		iounmap(iommu->mmio_base);
-	release_mem_region(iommu->mmio_phys, MMIO_REGION_LENGTH);
+	release_mem_region(iommu->mmio_phys, MMIO_REG_END_OFFSET);
 }
 
 /****************************************************************************
@@ -1160,6 +1161,33 @@ static int __init init_iommu_all(struct acpi_table_header *table)
 	return 0;
 }
 
+
+static void init_iommu_perf_ctr(struct amd_iommu *iommu)
+{
+	u64 val = 0xabcd, val2 = 0;
+
+	if (!iommu_feature(iommu, FEATURE_PC))
+		return;
+
+	amd_iommu_pc_present = true;
+
+	/* Check if the performance counters can be written to */
+	if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
+	    (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
+	    (val != val2)) {
+		pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
+		amd_iommu_pc_present = false;
+		return;
+	}
+
+	pr_info("AMD-Vi: IOMMU performance counters " "supported\n");
+
+	val = readl(iommu->mmio_base + MMIO_CNTR_CONF_OFFSET);
+	iommu->max_banks = (u8) ((val >> 12) & 0x3f);
+	iommu->max_counters = (u8) ((val >> 7) & 0xf);
+}
+
+
 static int iommu_init_pci(struct amd_iommu *iommu)
 {
 	int cap_ptr = iommu->cap_ptr;
@@ -1226,6 +1254,8 @@ static int iommu_init_pci(struct amd_iommu *iommu)
 	if (iommu->cap & (1UL << IOMMU_CAP_NPCACHE))
 		amd_iommu_np_cache = true;
 
+	init_iommu_perf_ctr(iommu);
+
 	if (is_rd890_iommu(iommu->dev)) {
 		int i, j;
 
@@ -2232,3 +2262,88 @@ bool amd_iommu_v2_supported(void)
 	return amd_iommu_v2_present;
 }
 EXPORT_SYMBOL(amd_iommu_v2_supported);
+
+/****************************************************************************
+ *
+ * IOMMU EFR Performance Counter support functionality. This code allows
+ * access to the IOMMU PC functionality.
+ *
+ ****************************************************************************/
+
+u8 amd_iommu_pc_get_max_banks(u16 devid)
+{
+	struct amd_iommu *iommu;
+
+	/* locate the iommu governing the devid */
+	iommu = amd_iommu_rlookup_table[devid];
+
+	if (iommu)
+		return iommu->max_banks;
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);
+
+bool amd_iommu_pc_supported(void)
+{
+	return amd_iommu_pc_present;
+}
+EXPORT_SYMBOL(amd_iommu_pc_supported);
+
+u8 amd_iommu_pc_get_max_counters(u16 devid)
+{
+	struct amd_iommu *iommu;
+
+	/* locate the iommu governing the devid */
+	iommu = amd_iommu_rlookup_table[devid];
+
+	if (iommu)
+		return iommu->max_counters;
+
+	return -ENODEV;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);
+
+int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
+				    u64 *value, bool is_write)
+{
+	struct amd_iommu *iommu;
+	u32 offset;
+	u32 max_offset_lim;
+
+	/* Make sure the IOMMU PC resource is available */
+	if (!amd_iommu_pc_present) {
+		pr_info("AMD IOMMU - PC Not supported.\n");
+		return -ENODEV;
+	}
+
+	/* locate the iommu associated with the device ID */
+	iommu = amd_iommu_rlookup_table[devid];
+	if (iommu == NULL)
+		return -ENODEV;
+
+	/* check for valid iommu pc register indexing */
+	if (fxn < 0 || fxn > 0x28 || (fxn & 7))
+		return -ENODEV;
+
+	offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
+
+	/* limit the offset to the hw defined mmio region aperture */
+	max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
+				(iommu->max_counters << 8) | 0x28);
+	if ((offset < MMIO_CNTR_REG_OFFSET) ||
+	    (offset > max_offset_lim))
+		return -EINVAL;
+
+	if (is_write) {
+		writel((u32)*value, iommu->mmio_base + offset);
+		writel((*value >> 32), iommu->mmio_base + offset + 4);
+	} else {
+		*value = readl(iommu->mmio_base + offset + 4);
+		*value <<= 32;
+		*value = readl(iommu->mmio_base + offset);
+	}
+
+	return 0;
+}
+EXPORT_SYMBOL(amd_iommu_pc_get_set_reg_val);
diff --git a/drivers/iommu/amd_iommu_proto.h b/drivers/iommu/amd_iommu_proto.h
index c294961..95ed6de 100644
--- a/drivers/iommu/amd_iommu_proto.h
+++ b/drivers/iommu/amd_iommu_proto.h
@@ -56,6 +56,13 @@ extern int amd_iommu_domain_set_gcr3(struct iommu_domain *dom, int pasid,
 extern int amd_iommu_domain_clear_gcr3(struct iommu_domain *dom, int pasid);
 extern struct iommu_domain *amd_iommu_get_v2_domain(struct pci_dev *pdev);
 
+/* IOMMU Performance Counter functions */
+extern bool amd_iommu_pc_supported(void);
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+extern int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
+				    u64 *value, bool is_write);
+
 #define PPR_SUCCESS			0x0
 #define PPR_INVALID			0x1
 #define PPR_FAILURE			0xf
diff --git a/drivers/iommu/amd_iommu_types.h b/drivers/iommu/amd_iommu_types.h
index 0285a21..a95f0ae 100644
--- a/drivers/iommu/amd_iommu_types.h
+++ b/drivers/iommu/amd_iommu_types.h
@@ -38,9 +38,6 @@
 #define ALIAS_TABLE_ENTRY_SIZE		2
 #define RLOOKUP_TABLE_ENTRY_SIZE	(sizeof(void *))
 
-/* Length of the MMIO region for the AMD IOMMU */
-#define MMIO_REGION_LENGTH       0x4000
-
 /* Capability offsets used by the driver */
 #define MMIO_CAP_HDR_OFFSET	0x00
 #define MMIO_RANGE_OFFSET	0x0c
@@ -78,6 +75,10 @@
 #define MMIO_STATUS_OFFSET	0x2020
 #define MMIO_PPR_HEAD_OFFSET	0x2030
 #define MMIO_PPR_TAIL_OFFSET	0x2038
+#define MMIO_CNTR_CONF_OFFSET	0x4000
+#define MMIO_CNTR_REG_OFFSET	0x40000
+#define MMIO_REG_END_OFFSET	0x80000
+
 
 
 /* Extended Feature Bits */
@@ -507,6 +508,7 @@ struct amd_iommu {
 
 	/* physical address of MMIO space */
 	u64 mmio_phys;
+
 	/* virtual address of MMIO space */
 	u8 __iomem *mmio_base;
 
@@ -584,6 +586,10 @@ struct amd_iommu {
 
 	/* The l2 indirect registers */
 	u32 stored_l2[0x83];
+
+	/* The maximum PC banks and counters/bank (PCSup=1) */
+	u8 max_banks;
+	u8 max_counters;
 };
 
 struct devid_map {
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
@ 2013-05-17 19:43   ` suravee.suthikulpanit-5C7GfCeVMHo
  0 siblings, 0 replies; 30+ messages in thread
From: suravee.suthikulpanit @ 2013-05-17 19:43 UTC (permalink / raw)
  To: linux-kernel, a.p.zijlstra, mingo, iommu, joro
  Cc: Suravee Suthikulpanit, Suravee Suthikulpanit

From: Suravee Suthikulpanit <Suravee.Suthikulpanit@amd.com>

Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.

To invoke the AMD IOMMU PMU, issue a perf tool command such as:

./perf stat -a -e amd_iommu/<events>/ <command>
or
./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>

For example:

./perf stat -a -e amd_iommu/mem_trans_total/ <command>

The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.

The IOMMU performance counter support is available starting in the 
AMD family15h model 0x30. For information regarding IOMMU performance 
counter configuration, please see the AMD IOMMU v2.5 specification.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
---
V3 Changes:
This version contain several changes:
	- Properly implement system-wide mode.
	- Add "amd_iommu/cpumask".
	- Add "amd_iommu/events"
	- Simplify "amd_iommu/format"
	- Add supports for devid, domid, and pasid filtering

 arch/x86/kernel/cpu/Makefile               |    2 +-
 arch/x86/kernel/cpu/perf_event_amd_iommu.c |  498 ++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_amd_iommu.h |   40 +++
 3 files changed, 539 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index b0684e4..fcbd3b8 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_event.o
 
 ifdef CONFIG_PERF_EVENTS
-obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o
+obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_knc.o perf_event_p4.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o
diff --git a/arch/x86/kernel/cpu/perf_event_amd_iommu.c b/arch/x86/kernel/cpu/perf_event_amd_iommu.c
new file mode 100644
index 0000000..fad04f2
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_amd_iommu.c
@@ -0,0 +1,498 @@
+/*
+ * Copyright (C) 2013 Advanced Micro Devices, Inc.
+ *
+ * Author: Steven Kinney <Steven.Kinney@amd.com>
+ * Author: Suravee Suthikulpanit <Suraveee.Suthikulpanit@amd.com>
+ *
+ * Perf: amd_iommu - AMD IOMMU Performance Counter PMU implementation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <linux/cpumask.h>
+#include <linux/slab.h>
+
+#include "perf_event.h"
+#include "perf_event_amd_iommu.h"
+
+#define _GET_BANK(ev)       ((u8)(ev->hw.extra_reg.reg >> 8))
+#define _GET_CNTR(ev)       ((u8)(ev->hw.extra_reg.reg))
+
+/* iommu pmu config masks */
+#define _GET_CSOURCE(ev)    ((ev->hw.config & 0xFFULL))
+#define _GET_DEVID(ev)      ((ev->hw.config >> 8)  & 0xFFFFULL)
+#define _GET_PASID(ev)      ((ev->hw.config >> 24) & 0xFFFFULL)
+#define _GET_DOMID(ev)      ((ev->hw.config >> 40) & 0xFFFFULL)
+#define _GET_DEVID_MASK(ev) ((ev->hw.extra_reg.config)  & 0xFFFFULL)
+#define _GET_PASID_MASK(ev) ((ev->hw.extra_reg.config >> 16) & 0xFFFFULL)
+#define _GET_DOMID_MASK(ev) ((ev->hw.extra_reg.config >> 32) & 0xFFFFULL)
+
+static struct perf_amd_iommu __perf_iommu;
+
+struct perf_amd_iommu {
+	struct pmu pmu;
+	u8 max_banks;
+	u8 max_counters;
+	u64 cntr_assign_mask;
+	raw_spinlock_t lock;
+	const struct attribute_group *attr_groups[4];
+};
+
+#define format_group	attr_groups[0]
+#define cpumask_group	attr_groups[1]
+#define events_group	attr_groups[2]
+#define null_group	attr_groups[3]
+
+/*---------------------------------------------
+ * sysfs format attributes
+ *---------------------------------------------*/
+PMU_FORMAT_ATTR(csource,    "config:0-7");
+PMU_FORMAT_ATTR(devid,      "config:8-23");
+PMU_FORMAT_ATTR(pasid,      "config:24-39");
+PMU_FORMAT_ATTR(domid,      "config:40-55");
+PMU_FORMAT_ATTR(devid_mask, "config1:0-15");
+PMU_FORMAT_ATTR(pasid_mask, "config1:16-31");
+PMU_FORMAT_ATTR(domid_mask, "config1:32-47");
+
+static struct attribute *iommu_format_attrs[] = {
+	&format_attr_csource.attr,
+	&format_attr_devid.attr,
+	&format_attr_pasid.attr,
+	&format_attr_domid.attr,
+	&format_attr_devid_mask.attr,
+	&format_attr_pasid_mask.attr,
+	&format_attr_domid_mask.attr,
+	NULL,
+};
+
+static struct attribute_group amd_iommu_format_group = {
+	.name = "format",
+	.attrs = iommu_format_attrs,
+};
+
+/*---------------------------------------------
+ * sysfs events attributes
+ *---------------------------------------------*/
+struct amd_iommu_event_desc {
+	struct kobj_attribute attr;
+	const char *event;
+};
+
+static ssize_t _iommu_event_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *buf)
+{
+	struct amd_iommu_event_desc *event =
+		container_of(attr, struct amd_iommu_event_desc, attr);
+	return sprintf(buf, "%s\n", event->event);
+}
+
+#define AMD_IOMMU_EVENT_DESC(_name, _event)			\
+{								\
+	.attr  = __ATTR(_name, 0444, _iommu_event_show, NULL),	\
+	.event = _event,					\
+}
+
+static struct amd_iommu_event_desc amd_iommu_v2_event_descs[] = {
+	AMD_IOMMU_EVENT_DESC(mem_pass_untrans,        "csource=0x01"),
+	AMD_IOMMU_EVENT_DESC(mem_pass_pretrans,       "csource=0x02"),
+	AMD_IOMMU_EVENT_DESC(mem_pass_excl,           "csource=0x03"),
+	AMD_IOMMU_EVENT_DESC(mem_target_abort,        "csource=0x04"),
+	AMD_IOMMU_EVENT_DESC(mem_trans_total,         "csource=0x05"),
+	AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pte_hit,   "csource=0x06"),
+	AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pte_mis,   "csource=0x07"),
+	AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pde_hit,   "csource=0x08"),
+	AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pde_mis,   "csource=0x09"),
+	AMD_IOMMU_EVENT_DESC(mem_dte_hit,             "csource=0x0a"),
+	AMD_IOMMU_EVENT_DESC(mem_dte_mis,             "csource=0x0b"),
+	AMD_IOMMU_EVENT_DESC(page_tbl_read_tot,       "csource=0x0c"),
+	AMD_IOMMU_EVENT_DESC(page_tbl_read_nst,       "csource=0x0d"),
+	AMD_IOMMU_EVENT_DESC(page_tbl_read_gst,       "csource=0x0e"),
+	AMD_IOMMU_EVENT_DESC(int_dte_hit,             "csource=0x0f"),
+	AMD_IOMMU_EVENT_DESC(int_dte_mis,             "csource=0x10"),
+	AMD_IOMMU_EVENT_DESC(cmd_processed,           "csource=0x11"),
+	AMD_IOMMU_EVENT_DESC(cmd_processed_inv,       "csource=0x12"),
+	AMD_IOMMU_EVENT_DESC(tlb_inv,                 "csource=0x13"),
+	{ /* end: all zeroes */ },
+};
+
+/*---------------------------------------------
+ * sysfs cpumask attributes
+ *---------------------------------------------*/
+static cpumask_t iommu_cpumask;
+
+static ssize_t _iommu_cpumask_show(struct device *dev,
+				   struct device_attribute *attr,
+				   char *buf)
+{
+	int n = cpulist_scnprintf(buf, PAGE_SIZE - 2, &iommu_cpumask);
+	buf[n++] = '\n';
+	buf[n] = '\0';
+	return n;
+}
+static DEVICE_ATTR(cpumask, S_IRUGO, _iommu_cpumask_show, NULL);
+
+static struct attribute *iommu_cpumask_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL,
+};
+
+static struct attribute_group amd_iommu_cpumask_group = {
+	.attrs = iommu_cpumask_attrs,
+};
+
+/*---------------------------------------------*/
+
+static int get_next_avail_iommu_bnk_cntr(struct perf_amd_iommu *perf_iommu)
+{
+	unsigned long flags;
+	int shift, bank, cntr, retval;
+	int max_banks = perf_iommu->max_banks;
+	int max_cntrs = perf_iommu->max_counters;
+
+	raw_spin_lock_irqsave(&perf_iommu->lock, flags);
+
+	for (bank = 0, shift = 0; bank < max_banks; bank++) {
+		for (cntr = 0; cntr < max_cntrs; cntr++) {
+			shift = bank + (bank*3) + cntr;
+			if (perf_iommu->cntr_assign_mask & (1ULL<<shift)) {
+				continue;
+			} else {
+				perf_iommu->cntr_assign_mask |= (1ULL<<shift);
+				retval = ((u16)((u16)bank<<8) | (u8)(cntr));
+				goto out;
+			}
+		}
+	}
+	retval = -ENOSPC;
+out:
+	raw_spin_unlock_irqrestore(&perf_iommu->lock, flags);
+	return retval;
+}
+
+static int clear_avail_iommu_bnk_cntr(struct perf_amd_iommu *perf_iommu,
+					u8 bank, u8 cntr)
+{
+	unsigned long flags;
+	int max_banks, max_cntrs;
+	int shift = 0;
+
+	max_banks = perf_iommu->max_banks;
+	max_cntrs = perf_iommu->max_counters;
+
+	if ((bank > max_banks) || (cntr > max_cntrs))
+		return -EINVAL;
+
+	shift = bank + cntr + (bank*3);
+
+	raw_spin_lock_irqsave(&perf_iommu->lock, flags);
+	perf_iommu->cntr_assign_mask &= ~(1ULL<<shift);
+	raw_spin_unlock_irqrestore(&perf_iommu->lock, flags);
+
+	return 0;
+}
+
+static int perf_iommu_event_init(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct perf_amd_iommu *perf_iommu;
+	u64 config, config1;
+
+	/* test the event attr type check for PMU enumeration */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/*
+	 * IOMMU counters are shared across all cores.
+	 * Therefore, it does not support per-process mode.
+	 * Also, it does not support event sampling mode.
+	 */
+	if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
+		return -EINVAL;
+
+	/* IOMMU counters do not have usr/os/guest/host bits */
+	if (event->attr.exclude_user || event->attr.exclude_kernel ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
+
+	if (event->cpu < 0)
+		return -EINVAL;
+
+	perf_iommu = &__perf_iommu;
+
+	if (event->pmu != &perf_iommu->pmu)
+		return -ENOENT;
+
+	if (perf_iommu) {
+		config = event->attr.config;
+		config1 = event->attr.config1;
+	} else {
+		return -EINVAL;
+	}
+
+	/* integrate with iommu base devid (0000), assume one iommu */
+	perf_iommu->max_banks =
+		amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
+	perf_iommu->max_counters =
+		amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
+
+	/* update the hw_perf_event struct with the iommu config data */
+	hwc->config = config;
+	hwc->extra_reg.config = config1;
+
+	return 0;
+}
+
+static void perf_iommu_enable_event(struct perf_event *ev)
+{
+	u8 csource = _GET_CSOURCE(ev);
+	u16 devid = _GET_DEVID(ev);
+	u64 reg = 0ULL;
+
+	reg = csource;
+	amd_iommu_pc_get_set_reg_val(devid,
+			_GET_BANK(ev), _GET_CNTR(ev) ,
+			 IOMMU_PC_COUNTER_SRC_REG, &reg, true);
+
+	reg = 0ULL | devid | (_GET_DEVID_MASK(ev) << 32);
+	if (reg)
+		reg |= (1UL << 31);
+	amd_iommu_pc_get_set_reg_val(devid,
+			_GET_BANK(ev), _GET_CNTR(ev) ,
+			 IOMMU_PC_DEVID_MATCH_REG, &reg, true);
+
+	reg = 0ULL | _GET_PASID(ev) | (_GET_PASID_MASK(ev) << 32);
+	if (reg)
+		reg |= (1UL << 31);
+	amd_iommu_pc_get_set_reg_val(devid,
+			_GET_BANK(ev), _GET_CNTR(ev) ,
+			 IOMMU_PC_PASID_MATCH_REG, &reg, true);
+
+	reg = 0ULL | _GET_DOMID(ev) | (_GET_DOMID_MASK(ev) << 32);
+	if (reg)
+		reg |= (1UL << 31);
+	amd_iommu_pc_get_set_reg_val(devid,
+			_GET_BANK(ev), _GET_CNTR(ev) ,
+			 IOMMU_PC_DOMID_MATCH_REG, &reg, true);
+}
+
+static void perf_iommu_disable_event(struct perf_event *event)
+{
+	u64 reg = 0ULL;
+
+	amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+			_GET_BANK(event), _GET_CNTR(event),
+			IOMMU_PC_COUNTER_SRC_REG, &reg, true);
+}
+
+static void perf_iommu_start(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	pr_debug("perf: amd_iommu:perf_iommu_start\n");
+	if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
+		return;
+
+	WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+	hwc->state = 0;
+
+	if (flags & PERF_EF_RELOAD) {
+		u64 prev_raw_count =  local64_read(&hwc->prev_count);
+		amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+				_GET_BANK(event), _GET_CNTR(event),
+				IOMMU_PC_COUNTER_REG, &prev_raw_count, true);
+	}
+
+	perf_iommu_enable_event(event);
+	perf_event_update_userpage(event);
+
+}
+
+static void perf_iommu_read(struct perf_event *event)
+{
+	u64 count = 0ULL;
+	u64 prev_raw_count = 0ULL;
+	u64 delta = 0ULL;
+	struct hw_perf_event *hwc = &event->hw;
+	pr_debug("perf: amd_iommu:perf_iommu_read\n");
+
+	amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+				_GET_BANK(event), _GET_CNTR(event),
+				IOMMU_PC_COUNTER_REG, &count, false);
+
+	/* IOMMU pc counter register is only 48 bits */
+	count &= 0xFFFFFFFFFFFFULL;
+
+	prev_raw_count =  local64_read(&hwc->prev_count);
+	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
+					count) != prev_raw_count)
+		return;
+
+	delta = count - prev_raw_count;
+	local64_add(delta, &event->count);
+
+}
+
+static void perf_iommu_stop(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 config;
+
+	pr_debug("perf: amd_iommu:perf_iommu_stop\n");
+
+	if (hwc->state & PERF_HES_UPTODATE)
+		return;
+
+	perf_iommu_disable_event(event);
+	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+	hwc->state |= PERF_HES_STOPPED;
+
+	if (hwc->state & PERF_HES_UPTODATE)
+		return;
+
+	config = hwc->config;
+	perf_iommu_read(event);
+	hwc->state |= PERF_HES_UPTODATE;
+}
+
+static int perf_iommu_add(struct perf_event *event, int flags)
+{
+	int retval;
+	struct perf_amd_iommu *perf_iommu =
+			container_of(event->pmu, struct perf_amd_iommu, pmu);
+
+	pr_debug("perf: amd_iommu:perf_iommu_add\n");
+	event->hw.state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+	/* request an iommu bank/counter */
+	retval = get_next_avail_iommu_bnk_cntr(perf_iommu);
+	if (retval != -ENOSPC)
+		event->hw.extra_reg.reg = (u16)retval;
+	else
+		return retval;
+
+	if (flags & PERF_EF_START)
+		perf_iommu_start(event, PERF_EF_RELOAD);
+
+	return 0;
+}
+
+static void perf_iommu_del(struct perf_event *event, int flags)
+{
+	struct perf_amd_iommu *perf_iommu =
+			container_of(event->pmu, struct perf_amd_iommu, pmu);
+
+	pr_debug("perf: amd_iommu:perf_iommu_del\n");
+	perf_iommu_stop(event, PERF_EF_UPDATE);
+
+	/* clear the assigned iommu bank/counter */
+	clear_avail_iommu_bnk_cntr(perf_iommu,
+				     _GET_BANK(event),
+				     _GET_CNTR(event));
+
+	perf_event_update_userpage(event);
+}
+
+static __init int _init_events_attrs(struct perf_amd_iommu *perf_iommu)
+{
+	struct attribute **attrs;
+	struct attribute_group *attr_group;
+	int i = 0, j;
+
+	while (amd_iommu_v2_event_descs[i].attr.attr.name)
+		i++;
+
+	attr_group = kzalloc(sizeof(struct attribute *)
+		* (i + 1) + sizeof(*attr_group), GFP_KERNEL);
+	if (!attr_group)
+		return -ENOMEM;
+
+	attrs = (struct attribute **)(attr_group + 1);
+	for (j = 0; j < i; j++)
+		attrs[j] = &amd_iommu_v2_event_descs[j].attr.attr;
+
+	attr_group->name = "events";
+	attr_group->attrs = attrs;
+	perf_iommu->events_group = attr_group;
+
+	return 0;
+}
+
+static __init void amd_iommu_pc_exit(void)
+{
+	if (__perf_iommu.events_group != NULL) {
+		kfree(__perf_iommu.events_group);
+		__perf_iommu.events_group = NULL;
+	}
+}
+
+static __init int _init_perf_amd_iommu(
+	struct perf_amd_iommu *perf_iommu, char *name)
+{
+	int ret;
+
+	raw_spin_lock_init(&perf_iommu->lock);
+
+	/* Init format attributes */
+	perf_iommu->format_group = &amd_iommu_format_group;
+
+	/* Init cpumask attributes to only core 0 */
+	cpumask_set_cpu(0, &iommu_cpumask);
+	perf_iommu->cpumask_group = &amd_iommu_cpumask_group;
+
+	/* Init events attributes */
+	if (_init_events_attrs(perf_iommu) != 0)
+		pr_err("perf: amd_iommu: Only support raw events.\n");
+
+	/* Init null attributes */
+	perf_iommu->null_group = NULL;
+	perf_iommu->pmu.attr_groups = perf_iommu->attr_groups;
+
+	ret = perf_pmu_register(&perf_iommu->pmu, name, -1);
+	if (ret) {
+		pr_err("perf: amd_iommu: Failed to initialized.\n");
+		amd_iommu_pc_exit();
+	} else {
+		pr_info("perf: amd_iommu: Detected. (%d banks, %d counters/bank)\n",
+			amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID),
+			amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID));
+	}
+
+	return ret;
+}
+
+static struct perf_amd_iommu __perf_iommu = {
+	.pmu = {
+		.event_init	= perf_iommu_event_init,
+		.add		= perf_iommu_add,
+		.del		= perf_iommu_del,
+		.start		= perf_iommu_start,
+		.stop		= perf_iommu_stop,
+		.read		= perf_iommu_read,
+	},
+	.max_banks		= 0x00,
+	.max_counters		= 0x00,
+	.cntr_assign_mask	= 0ULL,
+	.format_group		= NULL,
+	.cpumask_group		= NULL,
+	.events_group		= NULL,
+	.null_group		= NULL,
+};
+
+static __init int amd_iommu_pc_init(void)
+{
+	/* Make sure the IOMMU PC resource is available */
+	if (!amd_iommu_pc_supported()) {
+		pr_err("perf: amd_iommu PMU not installed. No support!\n");
+		return -ENODEV;
+	}
+
+	_init_perf_amd_iommu(&__perf_iommu, "amd_iommu");
+
+	return 0;
+}
+
+device_initcall(amd_iommu_pc_init);
diff --git a/arch/x86/kernel/cpu/perf_event_amd_iommu.h b/arch/x86/kernel/cpu/perf_event_amd_iommu.h
new file mode 100644
index 0000000..845d173
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_amd_iommu.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright (C) 2013 Advanced Micro Devices, Inc.
+ *
+ * Author: Steven Kinney <Steven.Kinney@amd.com>
+ * Author: Suravee Suthikulpanit <Suraveee.Suthikulpanit@amd.com>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _PERF_EVENT_AMD_IOMMU_H_
+#define _PERF_EVENT_AMD_IOMMU_H_
+
+/* iommu pc mmio region register indexes */
+#define IOMMU_PC_COUNTER_REG			0x00
+#define IOMMU_PC_COUNTER_SRC_REG		0x08
+#define IOMMU_PC_PASID_MATCH_REG		0x10
+#define IOMMU_PC_DOMID_MATCH_REG		0x18
+#define IOMMU_PC_DEVID_MATCH_REG		0x20
+#define IOMMU_PC_COUNTER_REPORT_REG		0x28
+
+/* maximun specified bank/counters */
+#define PC_MAX_SPEC_BNKS			64
+#define PC_MAX_SPEC_CNTRS			16
+
+/* iommu pc reg masks*/
+#define IOMMU_BASE_DEVID			0x0000
+
+/* amd_iommu_init.c external support functions */
+extern bool amd_iommu_pc_supported(void);
+
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+
+extern int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr,
+			u8 fxn, u64 *value, bool is_write);
+
+#endif /*_PERF_EVENT_AMD_IOMMU_H_*/
-- 
1.7.10.4



^ permalink raw reply related	[flat|nested] 30+ messages in thread

* [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
@ 2013-05-17 19:43   ` suravee.suthikulpanit-5C7GfCeVMHo
  0 siblings, 0 replies; 30+ messages in thread
From: suravee.suthikulpanit-5C7GfCeVMHo @ 2013-05-17 19:43 UTC (permalink / raw)
  To: linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	joro-zLv9SwRftAIdnm+yROfE0A

From: Suravee Suthikulpanit <Suravee.Suthikulpanit-5C7GfCeVMHo@public.gmane.org>

Implement a perf PMU to handle IOMMU performance counters and events.
The PMU only supports counting mode (e.g. perf stat). Since the counters
are shared across all cores, the PMU is implemented as "system-wide" mode.

To invoke the AMD IOMMU PMU, issue a perf tool command such as:

./perf stat -a -e amd_iommu/<events>/ <command>
or
./perf stat -a -e amd_iommu/config=<config-data>,config1=<config1-data>/ <command>

For example:

./perf stat -a -e amd_iommu/mem_trans_total/ <command>

The resulting count will be how many IOMMU total peripheral memory
operations were performed during the command execution window.

The IOMMU performance counter support is available starting in the 
AMD family15h model 0x30. For information regarding IOMMU performance 
counter configuration, please see the AMD IOMMU v2.5 specification.

Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit-5C7GfCeVMHo@public.gmane.org>
---
V3 Changes:
This version contain several changes:
	- Properly implement system-wide mode.
	- Add "amd_iommu/cpumask".
	- Add "amd_iommu/events"
	- Simplify "amd_iommu/format"
	- Add supports for devid, domid, and pasid filtering

 arch/x86/kernel/cpu/Makefile               |    2 +-
 arch/x86/kernel/cpu/perf_event_amd_iommu.c |  498 ++++++++++++++++++++++++++++
 arch/x86/kernel/cpu/perf_event_amd_iommu.h |   40 +++
 3 files changed, 539 insertions(+), 1 deletion(-)
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
 create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h

diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
index b0684e4..fcbd3b8 100644
--- a/arch/x86/kernel/cpu/Makefile
+++ b/arch/x86/kernel/cpu/Makefile
@@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
 obj-$(CONFIG_PERF_EVENTS)		+= perf_event.o
 
 ifdef CONFIG_PERF_EVENTS
-obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o
+obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_p6.o perf_event_knc.o perf_event_p4.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_lbr.o perf_event_intel_ds.o perf_event_intel.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= perf_event_intel_uncore.o
diff --git a/arch/x86/kernel/cpu/perf_event_amd_iommu.c b/arch/x86/kernel/cpu/perf_event_amd_iommu.c
new file mode 100644
index 0000000..fad04f2
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_amd_iommu.c
@@ -0,0 +1,498 @@
+/*
+ * Copyright (C) 2013 Advanced Micro Devices, Inc.
+ *
+ * Author: Steven Kinney <Steven.Kinney-5C7GfCeVMHo@public.gmane.org>
+ * Author: Suravee Suthikulpanit <Suraveee.Suthikulpanit-5C7GfCeVMHo@public.gmane.org>
+ *
+ * Perf: amd_iommu - AMD IOMMU Performance Counter PMU implementation
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#include <linux/perf_event.h>
+#include <linux/module.h>
+#include <linux/cpumask.h>
+#include <linux/slab.h>
+
+#include "perf_event.h"
+#include "perf_event_amd_iommu.h"
+
+#define _GET_BANK(ev)       ((u8)(ev->hw.extra_reg.reg >> 8))
+#define _GET_CNTR(ev)       ((u8)(ev->hw.extra_reg.reg))
+
+/* iommu pmu config masks */
+#define _GET_CSOURCE(ev)    ((ev->hw.config & 0xFFULL))
+#define _GET_DEVID(ev)      ((ev->hw.config >> 8)  & 0xFFFFULL)
+#define _GET_PASID(ev)      ((ev->hw.config >> 24) & 0xFFFFULL)
+#define _GET_DOMID(ev)      ((ev->hw.config >> 40) & 0xFFFFULL)
+#define _GET_DEVID_MASK(ev) ((ev->hw.extra_reg.config)  & 0xFFFFULL)
+#define _GET_PASID_MASK(ev) ((ev->hw.extra_reg.config >> 16) & 0xFFFFULL)
+#define _GET_DOMID_MASK(ev) ((ev->hw.extra_reg.config >> 32) & 0xFFFFULL)
+
+static struct perf_amd_iommu __perf_iommu;
+
+struct perf_amd_iommu {
+	struct pmu pmu;
+	u8 max_banks;
+	u8 max_counters;
+	u64 cntr_assign_mask;
+	raw_spinlock_t lock;
+	const struct attribute_group *attr_groups[4];
+};
+
+#define format_group	attr_groups[0]
+#define cpumask_group	attr_groups[1]
+#define events_group	attr_groups[2]
+#define null_group	attr_groups[3]
+
+/*---------------------------------------------
+ * sysfs format attributes
+ *---------------------------------------------*/
+PMU_FORMAT_ATTR(csource,    "config:0-7");
+PMU_FORMAT_ATTR(devid,      "config:8-23");
+PMU_FORMAT_ATTR(pasid,      "config:24-39");
+PMU_FORMAT_ATTR(domid,      "config:40-55");
+PMU_FORMAT_ATTR(devid_mask, "config1:0-15");
+PMU_FORMAT_ATTR(pasid_mask, "config1:16-31");
+PMU_FORMAT_ATTR(domid_mask, "config1:32-47");
+
+static struct attribute *iommu_format_attrs[] = {
+	&format_attr_csource.attr,
+	&format_attr_devid.attr,
+	&format_attr_pasid.attr,
+	&format_attr_domid.attr,
+	&format_attr_devid_mask.attr,
+	&format_attr_pasid_mask.attr,
+	&format_attr_domid_mask.attr,
+	NULL,
+};
+
+static struct attribute_group amd_iommu_format_group = {
+	.name = "format",
+	.attrs = iommu_format_attrs,
+};
+
+/*---------------------------------------------
+ * sysfs events attributes
+ *---------------------------------------------*/
+struct amd_iommu_event_desc {
+	struct kobj_attribute attr;
+	const char *event;
+};
+
+static ssize_t _iommu_event_show(struct kobject *kobj,
+				struct kobj_attribute *attr, char *buf)
+{
+	struct amd_iommu_event_desc *event =
+		container_of(attr, struct amd_iommu_event_desc, attr);
+	return sprintf(buf, "%s\n", event->event);
+}
+
+#define AMD_IOMMU_EVENT_DESC(_name, _event)			\
+{								\
+	.attr  = __ATTR(_name, 0444, _iommu_event_show, NULL),	\
+	.event = _event,					\
+}
+
+static struct amd_iommu_event_desc amd_iommu_v2_event_descs[] = {
+	AMD_IOMMU_EVENT_DESC(mem_pass_untrans,        "csource=0x01"),
+	AMD_IOMMU_EVENT_DESC(mem_pass_pretrans,       "csource=0x02"),
+	AMD_IOMMU_EVENT_DESC(mem_pass_excl,           "csource=0x03"),
+	AMD_IOMMU_EVENT_DESC(mem_target_abort,        "csource=0x04"),
+	AMD_IOMMU_EVENT_DESC(mem_trans_total,         "csource=0x05"),
+	AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pte_hit,   "csource=0x06"),
+	AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pte_mis,   "csource=0x07"),
+	AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pde_hit,   "csource=0x08"),
+	AMD_IOMMU_EVENT_DESC(mem_iommu_tlb_pde_mis,   "csource=0x09"),
+	AMD_IOMMU_EVENT_DESC(mem_dte_hit,             "csource=0x0a"),
+	AMD_IOMMU_EVENT_DESC(mem_dte_mis,             "csource=0x0b"),
+	AMD_IOMMU_EVENT_DESC(page_tbl_read_tot,       "csource=0x0c"),
+	AMD_IOMMU_EVENT_DESC(page_tbl_read_nst,       "csource=0x0d"),
+	AMD_IOMMU_EVENT_DESC(page_tbl_read_gst,       "csource=0x0e"),
+	AMD_IOMMU_EVENT_DESC(int_dte_hit,             "csource=0x0f"),
+	AMD_IOMMU_EVENT_DESC(int_dte_mis,             "csource=0x10"),
+	AMD_IOMMU_EVENT_DESC(cmd_processed,           "csource=0x11"),
+	AMD_IOMMU_EVENT_DESC(cmd_processed_inv,       "csource=0x12"),
+	AMD_IOMMU_EVENT_DESC(tlb_inv,                 "csource=0x13"),
+	{ /* end: all zeroes */ },
+};
+
+/*---------------------------------------------
+ * sysfs cpumask attributes
+ *---------------------------------------------*/
+static cpumask_t iommu_cpumask;
+
+static ssize_t _iommu_cpumask_show(struct device *dev,
+				   struct device_attribute *attr,
+				   char *buf)
+{
+	int n = cpulist_scnprintf(buf, PAGE_SIZE - 2, &iommu_cpumask);
+	buf[n++] = '\n';
+	buf[n] = '\0';
+	return n;
+}
+static DEVICE_ATTR(cpumask, S_IRUGO, _iommu_cpumask_show, NULL);
+
+static struct attribute *iommu_cpumask_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL,
+};
+
+static struct attribute_group amd_iommu_cpumask_group = {
+	.attrs = iommu_cpumask_attrs,
+};
+
+/*---------------------------------------------*/
+
+static int get_next_avail_iommu_bnk_cntr(struct perf_amd_iommu *perf_iommu)
+{
+	unsigned long flags;
+	int shift, bank, cntr, retval;
+	int max_banks = perf_iommu->max_banks;
+	int max_cntrs = perf_iommu->max_counters;
+
+	raw_spin_lock_irqsave(&perf_iommu->lock, flags);
+
+	for (bank = 0, shift = 0; bank < max_banks; bank++) {
+		for (cntr = 0; cntr < max_cntrs; cntr++) {
+			shift = bank + (bank*3) + cntr;
+			if (perf_iommu->cntr_assign_mask & (1ULL<<shift)) {
+				continue;
+			} else {
+				perf_iommu->cntr_assign_mask |= (1ULL<<shift);
+				retval = ((u16)((u16)bank<<8) | (u8)(cntr));
+				goto out;
+			}
+		}
+	}
+	retval = -ENOSPC;
+out:
+	raw_spin_unlock_irqrestore(&perf_iommu->lock, flags);
+	return retval;
+}
+
+static int clear_avail_iommu_bnk_cntr(struct perf_amd_iommu *perf_iommu,
+					u8 bank, u8 cntr)
+{
+	unsigned long flags;
+	int max_banks, max_cntrs;
+	int shift = 0;
+
+	max_banks = perf_iommu->max_banks;
+	max_cntrs = perf_iommu->max_counters;
+
+	if ((bank > max_banks) || (cntr > max_cntrs))
+		return -EINVAL;
+
+	shift = bank + cntr + (bank*3);
+
+	raw_spin_lock_irqsave(&perf_iommu->lock, flags);
+	perf_iommu->cntr_assign_mask &= ~(1ULL<<shift);
+	raw_spin_unlock_irqrestore(&perf_iommu->lock, flags);
+
+	return 0;
+}
+
+static int perf_iommu_event_init(struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct perf_amd_iommu *perf_iommu;
+	u64 config, config1;
+
+	/* test the event attr type check for PMU enumeration */
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/*
+	 * IOMMU counters are shared across all cores.
+	 * Therefore, it does not support per-process mode.
+	 * Also, it does not support event sampling mode.
+	 */
+	if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
+		return -EINVAL;
+
+	/* IOMMU counters do not have usr/os/guest/host bits */
+	if (event->attr.exclude_user || event->attr.exclude_kernel ||
+	    event->attr.exclude_host || event->attr.exclude_guest)
+		return -EINVAL;
+
+	if (event->cpu < 0)
+		return -EINVAL;
+
+	perf_iommu = &__perf_iommu;
+
+	if (event->pmu != &perf_iommu->pmu)
+		return -ENOENT;
+
+	if (perf_iommu) {
+		config = event->attr.config;
+		config1 = event->attr.config1;
+	} else {
+		return -EINVAL;
+	}
+
+	/* integrate with iommu base devid (0000), assume one iommu */
+	perf_iommu->max_banks =
+		amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
+	perf_iommu->max_counters =
+		amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
+
+	/* update the hw_perf_event struct with the iommu config data */
+	hwc->config = config;
+	hwc->extra_reg.config = config1;
+
+	return 0;
+}
+
+static void perf_iommu_enable_event(struct perf_event *ev)
+{
+	u8 csource = _GET_CSOURCE(ev);
+	u16 devid = _GET_DEVID(ev);
+	u64 reg = 0ULL;
+
+	reg = csource;
+	amd_iommu_pc_get_set_reg_val(devid,
+			_GET_BANK(ev), _GET_CNTR(ev) ,
+			 IOMMU_PC_COUNTER_SRC_REG, &reg, true);
+
+	reg = 0ULL | devid | (_GET_DEVID_MASK(ev) << 32);
+	if (reg)
+		reg |= (1UL << 31);
+	amd_iommu_pc_get_set_reg_val(devid,
+			_GET_BANK(ev), _GET_CNTR(ev) ,
+			 IOMMU_PC_DEVID_MATCH_REG, &reg, true);
+
+	reg = 0ULL | _GET_PASID(ev) | (_GET_PASID_MASK(ev) << 32);
+	if (reg)
+		reg |= (1UL << 31);
+	amd_iommu_pc_get_set_reg_val(devid,
+			_GET_BANK(ev), _GET_CNTR(ev) ,
+			 IOMMU_PC_PASID_MATCH_REG, &reg, true);
+
+	reg = 0ULL | _GET_DOMID(ev) | (_GET_DOMID_MASK(ev) << 32);
+	if (reg)
+		reg |= (1UL << 31);
+	amd_iommu_pc_get_set_reg_val(devid,
+			_GET_BANK(ev), _GET_CNTR(ev) ,
+			 IOMMU_PC_DOMID_MATCH_REG, &reg, true);
+}
+
+static void perf_iommu_disable_event(struct perf_event *event)
+{
+	u64 reg = 0ULL;
+
+	amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+			_GET_BANK(event), _GET_CNTR(event),
+			IOMMU_PC_COUNTER_SRC_REG, &reg, true);
+}
+
+static void perf_iommu_start(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	pr_debug("perf: amd_iommu:perf_iommu_start\n");
+	if (WARN_ON_ONCE(!(hwc->state & PERF_HES_STOPPED)))
+		return;
+
+	WARN_ON_ONCE(!(hwc->state & PERF_HES_UPTODATE));
+	hwc->state = 0;
+
+	if (flags & PERF_EF_RELOAD) {
+		u64 prev_raw_count =  local64_read(&hwc->prev_count);
+		amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+				_GET_BANK(event), _GET_CNTR(event),
+				IOMMU_PC_COUNTER_REG, &prev_raw_count, true);
+	}
+
+	perf_iommu_enable_event(event);
+	perf_event_update_userpage(event);
+
+}
+
+static void perf_iommu_read(struct perf_event *event)
+{
+	u64 count = 0ULL;
+	u64 prev_raw_count = 0ULL;
+	u64 delta = 0ULL;
+	struct hw_perf_event *hwc = &event->hw;
+	pr_debug("perf: amd_iommu:perf_iommu_read\n");
+
+	amd_iommu_pc_get_set_reg_val(_GET_DEVID(event),
+				_GET_BANK(event), _GET_CNTR(event),
+				IOMMU_PC_COUNTER_REG, &count, false);
+
+	/* IOMMU pc counter register is only 48 bits */
+	count &= 0xFFFFFFFFFFFFULL;
+
+	prev_raw_count =  local64_read(&hwc->prev_count);
+	if (local64_cmpxchg(&hwc->prev_count, prev_raw_count,
+					count) != prev_raw_count)
+		return;
+
+	delta = count - prev_raw_count;
+	local64_add(delta, &event->count);
+
+}
+
+static void perf_iommu_stop(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	u64 config;
+
+	pr_debug("perf: amd_iommu:perf_iommu_stop\n");
+
+	if (hwc->state & PERF_HES_UPTODATE)
+		return;
+
+	perf_iommu_disable_event(event);
+	WARN_ON_ONCE(hwc->state & PERF_HES_STOPPED);
+	hwc->state |= PERF_HES_STOPPED;
+
+	if (hwc->state & PERF_HES_UPTODATE)
+		return;
+
+	config = hwc->config;
+	perf_iommu_read(event);
+	hwc->state |= PERF_HES_UPTODATE;
+}
+
+static int perf_iommu_add(struct perf_event *event, int flags)
+{
+	int retval;
+	struct perf_amd_iommu *perf_iommu =
+			container_of(event->pmu, struct perf_amd_iommu, pmu);
+
+	pr_debug("perf: amd_iommu:perf_iommu_add\n");
+	event->hw.state = PERF_HES_UPTODATE | PERF_HES_STOPPED;
+
+	/* request an iommu bank/counter */
+	retval = get_next_avail_iommu_bnk_cntr(perf_iommu);
+	if (retval != -ENOSPC)
+		event->hw.extra_reg.reg = (u16)retval;
+	else
+		return retval;
+
+	if (flags & PERF_EF_START)
+		perf_iommu_start(event, PERF_EF_RELOAD);
+
+	return 0;
+}
+
+static void perf_iommu_del(struct perf_event *event, int flags)
+{
+	struct perf_amd_iommu *perf_iommu =
+			container_of(event->pmu, struct perf_amd_iommu, pmu);
+
+	pr_debug("perf: amd_iommu:perf_iommu_del\n");
+	perf_iommu_stop(event, PERF_EF_UPDATE);
+
+	/* clear the assigned iommu bank/counter */
+	clear_avail_iommu_bnk_cntr(perf_iommu,
+				     _GET_BANK(event),
+				     _GET_CNTR(event));
+
+	perf_event_update_userpage(event);
+}
+
+static __init int _init_events_attrs(struct perf_amd_iommu *perf_iommu)
+{
+	struct attribute **attrs;
+	struct attribute_group *attr_group;
+	int i = 0, j;
+
+	while (amd_iommu_v2_event_descs[i].attr.attr.name)
+		i++;
+
+	attr_group = kzalloc(sizeof(struct attribute *)
+		* (i + 1) + sizeof(*attr_group), GFP_KERNEL);
+	if (!attr_group)
+		return -ENOMEM;
+
+	attrs = (struct attribute **)(attr_group + 1);
+	for (j = 0; j < i; j++)
+		attrs[j] = &amd_iommu_v2_event_descs[j].attr.attr;
+
+	attr_group->name = "events";
+	attr_group->attrs = attrs;
+	perf_iommu->events_group = attr_group;
+
+	return 0;
+}
+
+static __init void amd_iommu_pc_exit(void)
+{
+	if (__perf_iommu.events_group != NULL) {
+		kfree(__perf_iommu.events_group);
+		__perf_iommu.events_group = NULL;
+	}
+}
+
+static __init int _init_perf_amd_iommu(
+	struct perf_amd_iommu *perf_iommu, char *name)
+{
+	int ret;
+
+	raw_spin_lock_init(&perf_iommu->lock);
+
+	/* Init format attributes */
+	perf_iommu->format_group = &amd_iommu_format_group;
+
+	/* Init cpumask attributes to only core 0 */
+	cpumask_set_cpu(0, &iommu_cpumask);
+	perf_iommu->cpumask_group = &amd_iommu_cpumask_group;
+
+	/* Init events attributes */
+	if (_init_events_attrs(perf_iommu) != 0)
+		pr_err("perf: amd_iommu: Only support raw events.\n");
+
+	/* Init null attributes */
+	perf_iommu->null_group = NULL;
+	perf_iommu->pmu.attr_groups = perf_iommu->attr_groups;
+
+	ret = perf_pmu_register(&perf_iommu->pmu, name, -1);
+	if (ret) {
+		pr_err("perf: amd_iommu: Failed to initialized.\n");
+		amd_iommu_pc_exit();
+	} else {
+		pr_info("perf: amd_iommu: Detected. (%d banks, %d counters/bank)\n",
+			amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID),
+			amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID));
+	}
+
+	return ret;
+}
+
+static struct perf_amd_iommu __perf_iommu = {
+	.pmu = {
+		.event_init	= perf_iommu_event_init,
+		.add		= perf_iommu_add,
+		.del		= perf_iommu_del,
+		.start		= perf_iommu_start,
+		.stop		= perf_iommu_stop,
+		.read		= perf_iommu_read,
+	},
+	.max_banks		= 0x00,
+	.max_counters		= 0x00,
+	.cntr_assign_mask	= 0ULL,
+	.format_group		= NULL,
+	.cpumask_group		= NULL,
+	.events_group		= NULL,
+	.null_group		= NULL,
+};
+
+static __init int amd_iommu_pc_init(void)
+{
+	/* Make sure the IOMMU PC resource is available */
+	if (!amd_iommu_pc_supported()) {
+		pr_err("perf: amd_iommu PMU not installed. No support!\n");
+		return -ENODEV;
+	}
+
+	_init_perf_amd_iommu(&__perf_iommu, "amd_iommu");
+
+	return 0;
+}
+
+device_initcall(amd_iommu_pc_init);
diff --git a/arch/x86/kernel/cpu/perf_event_amd_iommu.h b/arch/x86/kernel/cpu/perf_event_amd_iommu.h
new file mode 100644
index 0000000..845d173
--- /dev/null
+++ b/arch/x86/kernel/cpu/perf_event_amd_iommu.h
@@ -0,0 +1,40 @@
+/*
+ * Copyright (C) 2013 Advanced Micro Devices, Inc.
+ *
+ * Author: Steven Kinney <Steven.Kinney-5C7GfCeVMHo@public.gmane.org>
+ * Author: Suravee Suthikulpanit <Suraveee.Suthikulpanit-5C7GfCeVMHo@public.gmane.org>
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 as
+ * published by the Free Software Foundation.
+ */
+
+#ifndef _PERF_EVENT_AMD_IOMMU_H_
+#define _PERF_EVENT_AMD_IOMMU_H_
+
+/* iommu pc mmio region register indexes */
+#define IOMMU_PC_COUNTER_REG			0x00
+#define IOMMU_PC_COUNTER_SRC_REG		0x08
+#define IOMMU_PC_PASID_MATCH_REG		0x10
+#define IOMMU_PC_DOMID_MATCH_REG		0x18
+#define IOMMU_PC_DEVID_MATCH_REG		0x20
+#define IOMMU_PC_COUNTER_REPORT_REG		0x28
+
+/* maximun specified bank/counters */
+#define PC_MAX_SPEC_BNKS			64
+#define PC_MAX_SPEC_CNTRS			16
+
+/* iommu pc reg masks*/
+#define IOMMU_BASE_DEVID			0x0000
+
+/* amd_iommu_init.c external support functions */
+extern bool amd_iommu_pc_supported(void);
+
+extern u8 amd_iommu_pc_get_max_banks(u16 devid);
+
+extern u8 amd_iommu_pc_get_max_counters(u16 devid);
+
+extern int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr,
+			u8 fxn, u64 *value, bool is_write);
+
+#endif /*_PERF_EVENT_AMD_IOMMU_H_*/
-- 
1.7.10.4

^ permalink raw reply related	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-20 15:41   ` Suravee Suthikulanit
  0 siblings, 0 replies; 30+ messages in thread
From: Suravee Suthikulanit @ 2013-05-20 15:41 UTC (permalink / raw)
  To: suravee.suthikulpanit; +Cc: linux-kernel, a.p.zijlstra, mingo, iommu, joro

Peter,

Please let me know if you have any questions/concerns regarding the PMU 
implementation.

Joerg,
Please let me know if you have any questions/concerns regarding the 
changes in the IOMMU driver.

Thank you,

Suravee

On 5/17/2013 2:43 PM, suravee.suthikulpanit@amd.com wrote:
> From: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>
> These patches implement the AMD IOMMU Performance Counter functionality
> via custom perf PMU and implement static counting for various IOMMU
> translations.
>
>      1) Extend the AMD IOMMU initialization to include performance
>         counter enablement.
>
>      2) The perf AMD IOMMU PMU to manage performance counters, which
>         interface with the AMD IOMMU core driver.
>
> Steven L Kinney (1):
>    perf/x86/amd: Adding IOMMU PC resource management
>
> Suravee Suthikulpanit (1):
>    perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
>
>   arch/x86/kernel/cpu/Makefile               |    2 +-
>   arch/x86/kernel/cpu/perf_event_amd_iommu.c |  500 ++++++++++++++++++++++++++++
>   arch/x86/kernel/cpu/perf_event_amd_iommu.h |   40 +++
>   drivers/iommu/amd_iommu_init.c             |  121 ++++++-
>   drivers/iommu/amd_iommu_proto.h            |    7 +
>   drivers/iommu/amd_iommu_types.h            |   12 +-
>   6 files changed, 675 insertions(+), 7 deletions(-)
>   create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
>   create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-20 15:41   ` Suravee Suthikulanit
  0 siblings, 0 replies; 30+ messages in thread
From: Suravee Suthikulanit @ 2013-05-20 15:41 UTC (permalink / raw)
  To: suravee.suthikulpanit-5C7GfCeVMHo
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

Peter,

Please let me know if you have any questions/concerns regarding the PMU 
implementation.

Joerg,
Please let me know if you have any questions/concerns regarding the 
changes in the IOMMU driver.

Thank you,

Suravee

On 5/17/2013 2:43 PM, suravee.suthikulpanit-5C7GfCeVMHo@public.gmane.org wrote:
> From: Suravee Suthikulpanit <suravee.suthikulpanit-5C7GfCeVMHo@public.gmane.org>
>
> These patches implement the AMD IOMMU Performance Counter functionality
> via custom perf PMU and implement static counting for various IOMMU
> translations.
>
>      1) Extend the AMD IOMMU initialization to include performance
>         counter enablement.
>
>      2) The perf AMD IOMMU PMU to manage performance counters, which
>         interface with the AMD IOMMU core driver.
>
> Steven L Kinney (1):
>    perf/x86/amd: Adding IOMMU PC resource management
>
> Suravee Suthikulpanit (1):
>    perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
>
>   arch/x86/kernel/cpu/Makefile               |    2 +-
>   arch/x86/kernel/cpu/perf_event_amd_iommu.c |  500 ++++++++++++++++++++++++++++
>   arch/x86/kernel/cpu/perf_event_amd_iommu.h |   40 +++
>   drivers/iommu/amd_iommu_init.c             |  121 ++++++-
>   drivers/iommu/amd_iommu_proto.h            |    7 +
>   drivers/iommu/amd_iommu_types.h            |   12 +-
>   6 files changed, 675 insertions(+), 7 deletions(-)
>   create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.c
>   create mode 100644 arch/x86/kernel/cpu/perf_event_amd_iommu.h
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
  2013-05-20 15:41   ` Suravee Suthikulanit
  (?)
@ 2013-05-21  9:11   ` Peter Zijlstra
  2013-05-21 13:29       ` Suravee Suthikulanit
  -1 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2013-05-21  9:11 UTC (permalink / raw)
  To: Suravee Suthikulanit; +Cc: linux-kernel, mingo, iommu, joro

On Mon, May 20, 2013 at 10:41:29AM -0500, Suravee Suthikulanit wrote:
> Peter,
> 
> Please let me know if you have any questions/concerns regarding the PMU
> implementation.

Looks good, how would you like to go about merging this? Should I push
it through Ingo's tree or do you prefer it goes through some IOMMU tree?

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-21 13:29       ` Suravee Suthikulanit
  0 siblings, 0 replies; 30+ messages in thread
From: Suravee Suthikulanit @ 2013-05-21 13:29 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: linux-kernel, mingo, iommu, joro

On 5/21/2013 4:11 AM, Peter Zijlstra wrote:
> On Mon, May 20, 2013 at 10:41:29AM -0500, Suravee Suthikulanit wrote:
>> Peter,
>>
>> Please let me know if you have any questions/concerns regarding the PMU
>> implementation.
> Looks good, how would you like to go about merging this? Should I push
> it through Ingo's tree or do you prefer it goes through some IOMMU tree?
>
Thank you, Peter.  Since this is mostly PERF-related feature, I think it 
would be appropriate to push this through Ingo's tree.

Suravee


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-21 13:29       ` Suravee Suthikulanit
  0 siblings, 0 replies; 30+ messages in thread
From: Suravee Suthikulanit @ 2013-05-21 13:29 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On 5/21/2013 4:11 AM, Peter Zijlstra wrote:
> On Mon, May 20, 2013 at 10:41:29AM -0500, Suravee Suthikulanit wrote:
>> Peter,
>>
>> Please let me know if you have any questions/concerns regarding the PMU
>> implementation.
> Looks good, how would you like to go about merging this? Should I push
> it through Ingo's tree or do you prefer it goes through some IOMMU tree?
>
Thank you, Peter.  Since this is mostly PERF-related feature, I think it 
would be appropriate to push this through Ingo's tree.

Suravee

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-21 13:52         ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2013-05-21 13:52 UTC (permalink / raw)
  To: Suravee Suthikulanit; +Cc: linux-kernel, mingo, iommu, joro

On Tue, May 21, 2013 at 08:29:47AM -0500, Suravee Suthikulanit wrote:
> On 5/21/2013 4:11 AM, Peter Zijlstra wrote:
> >On Mon, May 20, 2013 at 10:41:29AM -0500, Suravee Suthikulanit wrote:
> >>Peter,
> >>
> >>Please let me know if you have any questions/concerns regarding the PMU
> >>implementation.
> >Looks good, how would you like to go about merging this? Should I push
> >it through Ingo's tree or do you prefer it goes through some IOMMU tree?
> >
> Thank you, Peter.  Since this is mostly PERF-related feature, I think it
> would be appropriate to push this through Ingo's tree.

OK, I'll take them and will push them to Ingo. 

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-21 13:52         ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2013-05-21 13:52 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, May 21, 2013 at 08:29:47AM -0500, Suravee Suthikulanit wrote:
> On 5/21/2013 4:11 AM, Peter Zijlstra wrote:
> >On Mon, May 20, 2013 at 10:41:29AM -0500, Suravee Suthikulanit wrote:
> >>Peter,
> >>
> >>Please let me know if you have any questions/concerns regarding the PMU
> >>implementation.
> >Looks good, how would you like to go about merging this? Should I push
> >it through Ingo's tree or do you prefer it goes through some IOMMU tree?
> >
> Thank you, Peter.  Since this is mostly PERF-related feature, I think it
> would be appropriate to push this through Ingo's tree.

OK, I'll take them and will push them to Ingo. 

Thanks!

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-21 14:25           ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2013-05-21 14:25 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Suravee Suthikulanit, linux-kernel, mingo, iommu

Hi Peter,

On Tue, May 21, 2013 at 03:52:31PM +0200, Peter Zijlstra wrote:
> OK, I'll take them and will push them to Ingo.

Please wait with that until I had a look at the IOMMU pieces.

Thanks,

	Joerg



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-21 14:25           ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2013-05-21 14:25 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Peter,

On Tue, May 21, 2013 at 03:52:31PM +0200, Peter Zijlstra wrote:
> OK, I'll take them and will push them to Ingo.

Please wait with that until I had a look at the IOMMU pieces.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-21 15:36             ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2013-05-21 15:36 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Suravee Suthikulanit, linux-kernel, mingo, iommu

On Tue, May 21, 2013 at 04:25:23PM +0200, Joerg Roedel wrote:
> Hi Peter,
> 
> On Tue, May 21, 2013 at 03:52:31PM +0200, Peter Zijlstra wrote:
> > OK, I'll take them and will push them to Ingo.
> 
> Please wait with that until I had a look at the IOMMU pieces.

Sure thing, I'll wait until I hear from you again.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-21 15:36             ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2013-05-21 15:36 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

On Tue, May 21, 2013 at 04:25:23PM +0200, Joerg Roedel wrote:
> Hi Peter,
> 
> On Tue, May 21, 2013 at 03:52:31PM +0200, Peter Zijlstra wrote:
> > OK, I'll take them and will push them to Ingo.
> 
> Please wait with that until I had a look at the IOMMU pieces.

Sure thing, I'll wait until I hear from you again.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
  2013-05-21 14:25           ` Joerg Roedel
  (?)
  (?)
@ 2013-05-27 16:30           ` Peter Zijlstra
  2013-05-27 17:26               ` Joerg Roedel
  -1 siblings, 1 reply; 30+ messages in thread
From: Peter Zijlstra @ 2013-05-27 16:30 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: Suravee Suthikulanit, linux-kernel, mingo, iommu

On Tue, May 21, 2013 at 04:25:23PM +0200, Joerg Roedel wrote:
> Hi Peter,
> 
> On Tue, May 21, 2013 at 03:52:31PM +0200, Peter Zijlstra wrote:
> > OK, I'll take them and will push them to Ingo.
> 
> Please wait with that until I had a look at the IOMMU pieces.

Have you had time to look at this yet? -- mostly sending this email to
make sure I don't forget about these patches.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-27 17:26               ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2013-05-27 17:26 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: Suravee Suthikulanit, linux-kernel, mingo, iommu

Hi Peter,

On Mon, May 27, 2013 at 06:30:23PM +0200, Peter Zijlstra wrote:
> Have you had time to look at this yet? -- mostly sending this email to
> make sure I don't forget about these patches.

Sorry about that, I didn't forget it. But I was traveling last week and
had an outage of my root-server (where all my email is) over the
weekend. But it should all be back to normal now and I will look at
these patches tomorrow.

Thanks,

	Joerg



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support
@ 2013-05-27 17:26               ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2013-05-27 17:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA

Hi Peter,

On Mon, May 27, 2013 at 06:30:23PM +0200, Peter Zijlstra wrote:
> Have you had time to look at this yet? -- mostly sending this email to
> make sure I don't forget about these patches.

Sorry about that, I didn't forget it. But I was traveling last week and
had an outage of my root-server (where all my email is) over the
weekend. But it should all be back to normal now and I will look at
these patches tomorrow.

Thanks,

	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management
@ 2013-05-28 11:07     ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2013-05-28 11:07 UTC (permalink / raw)
  To: suravee.suthikulpanit
  Cc: linux-kernel, a.p.zijlstra, mingo, iommu, Steven L Kinney

On Fri, May 17, 2013 at 02:43:31PM -0500, Suthikulpanit, Suravee wrote:
> From: Steven L Kinney <Steven.Kinney@amd.com>
> 
> Add functionality to check the availability of the AMD IOMMU Performance
> Counters and export this functionality to other core drivers, such as in this
> case, a perf AMD IOMMU PMU.  This feature is not bound to any specific AMD
> family/model other than the presence of the IOMMU with PC enabled.
> 
> The AMD IOMMU PC support static counting only at this time.
> 
> Signed-off-by: Steven Kinney <steven.kinney@amd.com>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
> ---
> V2 Changes:
> 	- Modify the amd_iommu_pc_get_set_reg_val function 
> 	to support 64 bit values. 
> 	- Fix logic to properly clear amd_iommu_pc_present when
> 	initialization failed.
> 
>  drivers/iommu/amd_iommu_init.c  |  121 ++++++++++++++++++++++++++++++++++++++-
>  drivers/iommu/amd_iommu_proto.h |    7 +++
>  drivers/iommu/amd_iommu_types.h |   12 +++-
>  3 files changed, 134 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index bf51abb..395fa29 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
>  u32 amd_iommu_max_pasids __read_mostly = ~0;
>  
>  bool amd_iommu_v2_present __read_mostly;
> +bool amd_iommu_pc_present __read_mostly;
>  
>  bool amd_iommu_force_isolation __read_mostly;
>  
> @@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
>   */
>  static u8 __iomem * __init iommu_map_mmio_space(u64 address)
>  {
> -	if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
> +	if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {

This length needs to be different on IOMMUv1 systems where the MMI
region is only 16kb large. Only use MMIO_REG_END_OFFSET on IOMMUv2
systems.

> +static void init_iommu_perf_ctr(struct amd_iommu *iommu)
> +{
> +	u64 val = 0xabcd, val2 = 0;
> +
> +	if (!iommu_feature(iommu, FEATURE_PC))
> +		return;
> +
> +	amd_iommu_pc_present = true;
> +
> +	/* Check if the performance counters can be written to */
> +	if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
> +	    (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
> +	    (val != val2)) {
> +		pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
> +		amd_iommu_pc_present = false;
> +		return;
> +	}
> +
> +	pr_info("AMD-Vi: IOMMU performance counters " "supported\n");

Why are this two strings? This makes it hard to grep for the message.


> +u8 amd_iommu_pc_get_max_banks(u16 devid)
> +{
> +	struct amd_iommu *iommu;
> +
> +	/* locate the iommu governing the devid */
> +	iommu = amd_iommu_rlookup_table[devid];
> +
> +	if (iommu)
> +		return iommu->max_banks;
> +
> +	return -ENODEV;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);

You can't return -ENODEV in a function returning u8. Return int instead.

> +
> +bool amd_iommu_pc_supported(void)
> +{
> +	return amd_iommu_pc_present;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_supported);
> +
> +u8 amd_iommu_pc_get_max_counters(u16 devid)
> +{
> +	struct amd_iommu *iommu;
> +
> +	/* locate the iommu governing the devid */
> +	iommu = amd_iommu_rlookup_table[devid];
> +
> +	if (iommu)
> +		return iommu->max_counters;
> +
> +	return -ENODEV;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);

Same here. Please don't return negative values as u8.

> +
> +int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
> +				    u64 *value, bool is_write)
> +{
> +	struct amd_iommu *iommu;
> +	u32 offset;
> +	u32 max_offset_lim;
> +
> +	/* Make sure the IOMMU PC resource is available */
> +	if (!amd_iommu_pc_present) {
> +		pr_info("AMD IOMMU - PC Not supported.\n");

This message is redundant. You already print about about performance
counter availability in the init code. Please remove it.

> +		return -ENODEV;
> +	}
> +
> +	/* locate the iommu associated with the device ID */
> +	iommu = amd_iommu_rlookup_table[devid];
> +	if (iommu == NULL)
> +		return -ENODEV;
> +
> +	/* check for valid iommu pc register indexing */
> +	if (fxn < 0 || fxn > 0x28 || (fxn & 7))
> +		return -ENODEV;

Does this check for calling errors? If so, you might consider turning
this into a WARN_ON.

> +
> +	offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
> +
> +	/* limit the offset to the hw defined mmio region aperture */
> +	max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
> +				(iommu->max_counters << 8) | 0x28);
> +	if ((offset < MMIO_CNTR_REG_OFFSET) ||
> +	    (offset > max_offset_lim))
> +		return -EINVAL;
> +
> +	if (is_write) {
> +		writel((u32)*value, iommu->mmio_base + offset);
> +		writel((*value >> 32), iommu->mmio_base + offset + 4);
> +	} else {
> +		*value = readl(iommu->mmio_base + offset + 4);
> +		*value <<= 32;
> +		*value = readl(iommu->mmio_base + offset);
> +	}
> +
> +	return 0;
> +}


	Joerg



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management
@ 2013-05-28 11:07     ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2013-05-28 11:07 UTC (permalink / raw)
  To: suravee.suthikulpanit-5C7GfCeVMHo
  Cc: Steven L Kinney,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

On Fri, May 17, 2013 at 02:43:31PM -0500, Suthikulpanit, Suravee wrote:
> From: Steven L Kinney <Steven.Kinney-5C7GfCeVMHo@public.gmane.org>
> 
> Add functionality to check the availability of the AMD IOMMU Performance
> Counters and export this functionality to other core drivers, such as in this
> case, a perf AMD IOMMU PMU.  This feature is not bound to any specific AMD
> family/model other than the presence of the IOMMU with PC enabled.
> 
> The AMD IOMMU PC support static counting only at this time.
> 
> Signed-off-by: Steven Kinney <steven.kinney-5C7GfCeVMHo@public.gmane.org>
> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit-5C7GfCeVMHo@public.gmane.org>
> ---
> V2 Changes:
> 	- Modify the amd_iommu_pc_get_set_reg_val function 
> 	to support 64 bit values. 
> 	- Fix logic to properly clear amd_iommu_pc_present when
> 	initialization failed.
> 
>  drivers/iommu/amd_iommu_init.c  |  121 ++++++++++++++++++++++++++++++++++++++-
>  drivers/iommu/amd_iommu_proto.h |    7 +++
>  drivers/iommu/amd_iommu_types.h |   12 +++-
>  3 files changed, 134 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
> index bf51abb..395fa29 100644
> --- a/drivers/iommu/amd_iommu_init.c
> +++ b/drivers/iommu/amd_iommu_init.c
> @@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
>  u32 amd_iommu_max_pasids __read_mostly = ~0;
>  
>  bool amd_iommu_v2_present __read_mostly;
> +bool amd_iommu_pc_present __read_mostly;
>  
>  bool amd_iommu_force_isolation __read_mostly;
>  
> @@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
>   */
>  static u8 __iomem * __init iommu_map_mmio_space(u64 address)
>  {
> -	if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
> +	if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {

This length needs to be different on IOMMUv1 systems where the MMI
region is only 16kb large. Only use MMIO_REG_END_OFFSET on IOMMUv2
systems.

> +static void init_iommu_perf_ctr(struct amd_iommu *iommu)
> +{
> +	u64 val = 0xabcd, val2 = 0;
> +
> +	if (!iommu_feature(iommu, FEATURE_PC))
> +		return;
> +
> +	amd_iommu_pc_present = true;
> +
> +	/* Check if the performance counters can be written to */
> +	if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
> +	    (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
> +	    (val != val2)) {
> +		pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
> +		amd_iommu_pc_present = false;
> +		return;
> +	}
> +
> +	pr_info("AMD-Vi: IOMMU performance counters " "supported\n");

Why are this two strings? This makes it hard to grep for the message.


> +u8 amd_iommu_pc_get_max_banks(u16 devid)
> +{
> +	struct amd_iommu *iommu;
> +
> +	/* locate the iommu governing the devid */
> +	iommu = amd_iommu_rlookup_table[devid];
> +
> +	if (iommu)
> +		return iommu->max_banks;
> +
> +	return -ENODEV;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);

You can't return -ENODEV in a function returning u8. Return int instead.

> +
> +bool amd_iommu_pc_supported(void)
> +{
> +	return amd_iommu_pc_present;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_supported);
> +
> +u8 amd_iommu_pc_get_max_counters(u16 devid)
> +{
> +	struct amd_iommu *iommu;
> +
> +	/* locate the iommu governing the devid */
> +	iommu = amd_iommu_rlookup_table[devid];
> +
> +	if (iommu)
> +		return iommu->max_counters;
> +
> +	return -ENODEV;
> +}
> +EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);

Same here. Please don't return negative values as u8.

> +
> +int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
> +				    u64 *value, bool is_write)
> +{
> +	struct amd_iommu *iommu;
> +	u32 offset;
> +	u32 max_offset_lim;
> +
> +	/* Make sure the IOMMU PC resource is available */
> +	if (!amd_iommu_pc_present) {
> +		pr_info("AMD IOMMU - PC Not supported.\n");

This message is redundant. You already print about about performance
counter availability in the init code. Please remove it.

> +		return -ENODEV;
> +	}
> +
> +	/* locate the iommu associated with the device ID */
> +	iommu = amd_iommu_rlookup_table[devid];
> +	if (iommu == NULL)
> +		return -ENODEV;
> +
> +	/* check for valid iommu pc register indexing */
> +	if (fxn < 0 || fxn > 0x28 || (fxn & 7))
> +		return -ENODEV;

Does this check for calling errors? If so, you might consider turning
this into a WARN_ON.

> +
> +	offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
> +
> +	/* limit the offset to the hw defined mmio region aperture */
> +	max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
> +				(iommu->max_counters << 8) | 0x28);
> +	if ((offset < MMIO_CNTR_REG_OFFSET) ||
> +	    (offset > max_offset_lim))
> +		return -EINVAL;
> +
> +	if (is_write) {
> +		writel((u32)*value, iommu->mmio_base + offset);
> +		writel((*value >> 32), iommu->mmio_base + offset + 4);
> +	} else {
> +		*value = readl(iommu->mmio_base + offset + 4);
> +		*value <<= 32;
> +		*value = readl(iommu->mmio_base + offset);
> +	}
> +
> +	return 0;
> +}


	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
@ 2013-05-28 12:18     ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2013-05-28 12:18 UTC (permalink / raw)
  To: suravee.suthikulpanit; +Cc: linux-kernel, a.p.zijlstra, mingo, iommu

On Fri, May 17, 2013 at 02:43:32PM -0500, Suthikulpanit, Suravee wrote:
> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
> index b0684e4..fcbd3b8 100644
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
>  obj-$(CONFIG_PERF_EVENTS)		+= perf_event.o
>  
>  ifdef CONFIG_PERF_EVENTS
> -obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o
> +obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o

This should also depend on CONFIG_AMD_IOMMU. If no IOMMU driver is
compiled in it doesn't make sense to have that PMU.

> +static int perf_iommu_event_init(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct perf_amd_iommu *perf_iommu;
> +	u64 config, config1;
> +
> +	/* test the event attr type check for PMU enumeration */
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;
> +
> +	/*
> +	 * IOMMU counters are shared across all cores.
> +	 * Therefore, it does not support per-process mode.
> +	 * Also, it does not support event sampling mode.
> +	 */
> +	if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> +		return -EINVAL;
> +
> +	/* IOMMU counters do not have usr/os/guest/host bits */
> +	if (event->attr.exclude_user || event->attr.exclude_kernel ||
> +	    event->attr.exclude_host || event->attr.exclude_guest)
> +		return -EINVAL;
> +
> +	if (event->cpu < 0)
> +		return -EINVAL;
> +
> +	perf_iommu = &__perf_iommu;
> +
> +	if (event->pmu != &perf_iommu->pmu)
> +		return -ENOENT;
> +
> +	if (perf_iommu) {
> +		config = event->attr.config;
> +		config1 = event->attr.config1;
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	/* integrate with iommu base devid (0000), assume one iommu */
> +	perf_iommu->max_banks =
> +		amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
> +	perf_iommu->max_counters =
> +		amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
> +
> +	/* update the hw_perf_event struct with the iommu config data */
> +	hwc->config = config;
> +	hwc->extra_reg.config = config1;
> +
> +	return 0;
> +}

That implementation is very basic. Any reason for not using the event
reporting mechanism of the IOMMU? You could implement a nice perf
iommutop or something to see which devices do the most transactions or
something like that.


	Joerg



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
@ 2013-05-28 12:18     ` Joerg Roedel
  0 siblings, 0 replies; 30+ messages in thread
From: Joerg Roedel @ 2013-05-28 12:18 UTC (permalink / raw)
  To: suravee.suthikulpanit-5C7GfCeVMHo
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

On Fri, May 17, 2013 at 02:43:32PM -0500, Suthikulpanit, Suravee wrote:
> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
> index b0684e4..fcbd3b8 100644
> --- a/arch/x86/kernel/cpu/Makefile
> +++ b/arch/x86/kernel/cpu/Makefile
> @@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
>  obj-$(CONFIG_PERF_EVENTS)		+= perf_event.o
>  
>  ifdef CONFIG_PERF_EVENTS
> -obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o
> +obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o

This should also depend on CONFIG_AMD_IOMMU. If no IOMMU driver is
compiled in it doesn't make sense to have that PMU.

> +static int perf_iommu_event_init(struct perf_event *event)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct perf_amd_iommu *perf_iommu;
> +	u64 config, config1;
> +
> +	/* test the event attr type check for PMU enumeration */
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;
> +
> +	/*
> +	 * IOMMU counters are shared across all cores.
> +	 * Therefore, it does not support per-process mode.
> +	 * Also, it does not support event sampling mode.
> +	 */
> +	if (is_sampling_event(event) || event->attach_state & PERF_ATTACH_TASK)
> +		return -EINVAL;
> +
> +	/* IOMMU counters do not have usr/os/guest/host bits */
> +	if (event->attr.exclude_user || event->attr.exclude_kernel ||
> +	    event->attr.exclude_host || event->attr.exclude_guest)
> +		return -EINVAL;
> +
> +	if (event->cpu < 0)
> +		return -EINVAL;
> +
> +	perf_iommu = &__perf_iommu;
> +
> +	if (event->pmu != &perf_iommu->pmu)
> +		return -ENOENT;
> +
> +	if (perf_iommu) {
> +		config = event->attr.config;
> +		config1 = event->attr.config1;
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	/* integrate with iommu base devid (0000), assume one iommu */
> +	perf_iommu->max_banks =
> +		amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
> +	perf_iommu->max_counters =
> +		amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
> +
> +	/* update the hw_perf_event struct with the iommu config data */
> +	hwc->config = config;
> +	hwc->extra_reg.config = config1;
> +
> +	return 0;
> +}

That implementation is very basic. Any reason for not using the event
reporting mechanism of the IOMMU? You could implement a nice perf
iommutop or something to see which devices do the most transactions or
something like that.


	Joerg

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management
@ 2013-05-28 16:28       ` Suravee Suthikulanit
  0 siblings, 0 replies; 30+ messages in thread
From: Suravee Suthikulanit @ 2013-05-28 16:28 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: linux-kernel, a.p.zijlstra, mingo, iommu, Steven L Kinney

Joerg,

Thanks for the review.  I'll post new patch set to address your comment 
shortly.

Suravee

On 5/28/2013 6:07 AM, Joerg Roedel wrote:
> On Fri, May 17, 2013 at 02:43:31PM -0500, Suthikulpanit, Suravee wrote:
>> From: Steven L Kinney <Steven.Kinney@amd.com>
>>
>> Add functionality to check the availability of the AMD IOMMU Performance
>> Counters and export this functionality to other core drivers, such as in this
>> case, a perf AMD IOMMU PMU.  This feature is not bound to any specific AMD
>> family/model other than the presence of the IOMMU with PC enabled.
>>
>> The AMD IOMMU PC support static counting only at this time.
>>
>> Signed-off-by: Steven Kinney <steven.kinney@amd.com>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit@amd.com>
>> ---
>> V2 Changes:
>> 	- Modify the amd_iommu_pc_get_set_reg_val function
>> 	to support 64 bit values.
>> 	- Fix logic to properly clear amd_iommu_pc_present when
>> 	initialization failed.
>>
>>   drivers/iommu/amd_iommu_init.c  |  121 ++++++++++++++++++++++++++++++++++++++-
>>   drivers/iommu/amd_iommu_proto.h |    7 +++
>>   drivers/iommu/amd_iommu_types.h |   12 +++-
>>   3 files changed, 134 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
>> index bf51abb..395fa29 100644
>> --- a/drivers/iommu/amd_iommu_init.c
>> +++ b/drivers/iommu/amd_iommu_init.c
>> @@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
>>   u32 amd_iommu_max_pasids __read_mostly = ~0;
>>   
>>   bool amd_iommu_v2_present __read_mostly;
>> +bool amd_iommu_pc_present __read_mostly;
>>   
>>   bool amd_iommu_force_isolation __read_mostly;
>>   
>> @@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
>>    */
>>   static u8 __iomem * __init iommu_map_mmio_space(u64 address)
>>   {
>> -	if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
>> +	if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {
> This length needs to be different on IOMMUv1 systems where the MMI
> region is only 16kb large. Only use MMIO_REG_END_OFFSET on IOMMUv2
> systems.
>
>> +static void init_iommu_perf_ctr(struct amd_iommu *iommu)
>> +{
>> +	u64 val = 0xabcd, val2 = 0;
>> +
>> +	if (!iommu_feature(iommu, FEATURE_PC))
>> +		return;
>> +
>> +	amd_iommu_pc_present = true;
>> +
>> +	/* Check if the performance counters can be written to */
>> +	if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
>> +	    (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
>> +	    (val != val2)) {
>> +		pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
>> +		amd_iommu_pc_present = false;
>> +		return;
>> +	}
>> +
>> +	pr_info("AMD-Vi: IOMMU performance counters " "supported\n");
> Why are this two strings? This makes it hard to grep for the message.
>
>
>> +u8 amd_iommu_pc_get_max_banks(u16 devid)
>> +{
>> +	struct amd_iommu *iommu;
>> +
>> +	/* locate the iommu governing the devid */
>> +	iommu = amd_iommu_rlookup_table[devid];
>> +
>> +	if (iommu)
>> +		return iommu->max_banks;
>> +
>> +	return -ENODEV;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);
> You can't return -ENODEV in a function returning u8. Return int instead.
>
>> +
>> +bool amd_iommu_pc_supported(void)
>> +{
>> +	return amd_iommu_pc_present;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_supported);
>> +
>> +u8 amd_iommu_pc_get_max_counters(u16 devid)
>> +{
>> +	struct amd_iommu *iommu;
>> +
>> +	/* locate the iommu governing the devid */
>> +	iommu = amd_iommu_rlookup_table[devid];
>> +
>> +	if (iommu)
>> +		return iommu->max_counters;
>> +
>> +	return -ENODEV;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);
> Same here. Please don't return negative values as u8.
>
>> +
>> +int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
>> +				    u64 *value, bool is_write)
>> +{
>> +	struct amd_iommu *iommu;
>> +	u32 offset;
>> +	u32 max_offset_lim;
>> +
>> +	/* Make sure the IOMMU PC resource is available */
>> +	if (!amd_iommu_pc_present) {
>> +		pr_info("AMD IOMMU - PC Not supported.\n");
> This message is redundant. You already print about about performance
> counter availability in the init code. Please remove it.
>
>> +		return -ENODEV;
>> +	}
>> +
>> +	/* locate the iommu associated with the device ID */
>> +	iommu = amd_iommu_rlookup_table[devid];
>> +	if (iommu == NULL)
>> +		return -ENODEV;
>> +
>> +	/* check for valid iommu pc register indexing */
>> +	if (fxn < 0 || fxn > 0x28 || (fxn & 7))
>> +		return -ENODEV;
> Does this check for calling errors? If so, you might consider turning
> this into a WARN_ON.
>
>> +
>> +	offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
>> +
>> +	/* limit the offset to the hw defined mmio region aperture */
>> +	max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
>> +				(iommu->max_counters << 8) | 0x28);
>> +	if ((offset < MMIO_CNTR_REG_OFFSET) ||
>> +	    (offset > max_offset_lim))
>> +		return -EINVAL;
>> +
>> +	if (is_write) {
>> +		writel((u32)*value, iommu->mmio_base + offset);
>> +		writel((*value >> 32), iommu->mmio_base + offset + 4);
>> +	} else {
>> +		*value = readl(iommu->mmio_base + offset + 4);
>> +		*value <<= 32;
>> +		*value = readl(iommu->mmio_base + offset);
>> +	}
>> +
>> +	return 0;
>> +}
>
> 	Joerg
>
>
>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management
@ 2013-05-28 16:28       ` Suravee Suthikulanit
  0 siblings, 0 replies; 30+ messages in thread
From: Suravee Suthikulanit @ 2013-05-28 16:28 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: Steven L Kinney,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

Joerg,

Thanks for the review.  I'll post new patch set to address your comment 
shortly.

Suravee

On 5/28/2013 6:07 AM, Joerg Roedel wrote:
> On Fri, May 17, 2013 at 02:43:31PM -0500, Suthikulpanit, Suravee wrote:
>> From: Steven L Kinney <Steven.Kinney-5C7GfCeVMHo@public.gmane.org>
>>
>> Add functionality to check the availability of the AMD IOMMU Performance
>> Counters and export this functionality to other core drivers, such as in this
>> case, a perf AMD IOMMU PMU.  This feature is not bound to any specific AMD
>> family/model other than the presence of the IOMMU with PC enabled.
>>
>> The AMD IOMMU PC support static counting only at this time.
>>
>> Signed-off-by: Steven Kinney <steven.kinney-5C7GfCeVMHo@public.gmane.org>
>> Signed-off-by: Suravee Suthikulpanit <suravee.suthikulpanit-5C7GfCeVMHo@public.gmane.org>
>> ---
>> V2 Changes:
>> 	- Modify the amd_iommu_pc_get_set_reg_val function
>> 	to support 64 bit values.
>> 	- Fix logic to properly clear amd_iommu_pc_present when
>> 	initialization failed.
>>
>>   drivers/iommu/amd_iommu_init.c  |  121 ++++++++++++++++++++++++++++++++++++++-
>>   drivers/iommu/amd_iommu_proto.h |    7 +++
>>   drivers/iommu/amd_iommu_types.h |   12 +++-
>>   3 files changed, 134 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/iommu/amd_iommu_init.c b/drivers/iommu/amd_iommu_init.c
>> index bf51abb..395fa29 100644
>> --- a/drivers/iommu/amd_iommu_init.c
>> +++ b/drivers/iommu/amd_iommu_init.c
>> @@ -154,6 +154,7 @@ bool amd_iommu_iotlb_sup __read_mostly = true;
>>   u32 amd_iommu_max_pasids __read_mostly = ~0;
>>   
>>   bool amd_iommu_v2_present __read_mostly;
>> +bool amd_iommu_pc_present __read_mostly;
>>   
>>   bool amd_iommu_force_isolation __read_mostly;
>>   
>> @@ -371,21 +372,21 @@ static void iommu_disable(struct amd_iommu *iommu)
>>    */
>>   static u8 __iomem * __init iommu_map_mmio_space(u64 address)
>>   {
>> -	if (!request_mem_region(address, MMIO_REGION_LENGTH, "amd_iommu")) {
>> +	if (!request_mem_region(address, MMIO_REG_END_OFFSET, "amd_iommu")) {
> This length needs to be different on IOMMUv1 systems where the MMI
> region is only 16kb large. Only use MMIO_REG_END_OFFSET on IOMMUv2
> systems.
>
>> +static void init_iommu_perf_ctr(struct amd_iommu *iommu)
>> +{
>> +	u64 val = 0xabcd, val2 = 0;
>> +
>> +	if (!iommu_feature(iommu, FEATURE_PC))
>> +		return;
>> +
>> +	amd_iommu_pc_present = true;
>> +
>> +	/* Check if the performance counters can be written to */
>> +	if ((0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val, true)) ||
>> +	    (0 != amd_iommu_pc_get_set_reg_val(0, 0, 0, 0, &val2, false)) ||
>> +	    (val != val2)) {
>> +		pr_err("AMD-Vi: Unable to write to IOMMU perf counter.\n");
>> +		amd_iommu_pc_present = false;
>> +		return;
>> +	}
>> +
>> +	pr_info("AMD-Vi: IOMMU performance counters " "supported\n");
> Why are this two strings? This makes it hard to grep for the message.
>
>
>> +u8 amd_iommu_pc_get_max_banks(u16 devid)
>> +{
>> +	struct amd_iommu *iommu;
>> +
>> +	/* locate the iommu governing the devid */
>> +	iommu = amd_iommu_rlookup_table[devid];
>> +
>> +	if (iommu)
>> +		return iommu->max_banks;
>> +
>> +	return -ENODEV;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_get_max_banks);
> You can't return -ENODEV in a function returning u8. Return int instead.
>
>> +
>> +bool amd_iommu_pc_supported(void)
>> +{
>> +	return amd_iommu_pc_present;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_supported);
>> +
>> +u8 amd_iommu_pc_get_max_counters(u16 devid)
>> +{
>> +	struct amd_iommu *iommu;
>> +
>> +	/* locate the iommu governing the devid */
>> +	iommu = amd_iommu_rlookup_table[devid];
>> +
>> +	if (iommu)
>> +		return iommu->max_counters;
>> +
>> +	return -ENODEV;
>> +}
>> +EXPORT_SYMBOL(amd_iommu_pc_get_max_counters);
> Same here. Please don't return negative values as u8.
>
>> +
>> +int amd_iommu_pc_get_set_reg_val(u16 devid, u8 bank, u8 cntr, u8 fxn,
>> +				    u64 *value, bool is_write)
>> +{
>> +	struct amd_iommu *iommu;
>> +	u32 offset;
>> +	u32 max_offset_lim;
>> +
>> +	/* Make sure the IOMMU PC resource is available */
>> +	if (!amd_iommu_pc_present) {
>> +		pr_info("AMD IOMMU - PC Not supported.\n");
> This message is redundant. You already print about about performance
> counter availability in the init code. Please remove it.
>
>> +		return -ENODEV;
>> +	}
>> +
>> +	/* locate the iommu associated with the device ID */
>> +	iommu = amd_iommu_rlookup_table[devid];
>> +	if (iommu == NULL)
>> +		return -ENODEV;
>> +
>> +	/* check for valid iommu pc register indexing */
>> +	if (fxn < 0 || fxn > 0x28 || (fxn & 7))
>> +		return -ENODEV;
> Does this check for calling errors? If so, you might consider turning
> this into a WARN_ON.
>
>> +
>> +	offset = (u32)(((0x40|bank) << 12) | (cntr << 8) | fxn);
>> +
>> +	/* limit the offset to the hw defined mmio region aperture */
>> +	max_offset_lim = (u32)(((0x40|iommu->max_banks) << 12) |
>> +				(iommu->max_counters << 8) | 0x28);
>> +	if ((offset < MMIO_CNTR_REG_OFFSET) ||
>> +	    (offset > max_offset_lim))
>> +		return -EINVAL;
>> +
>> +	if (is_write) {
>> +		writel((u32)*value, iommu->mmio_base + offset);
>> +		writel((*value >> 32), iommu->mmio_base + offset + 4);
>> +	} else {
>> +		*value = readl(iommu->mmio_base + offset + 4);
>> +		*value <<= 32;
>> +		*value = readl(iommu->mmio_base + offset);
>> +	}
>> +
>> +	return 0;
>> +}
>
> 	Joerg
>
>
>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
@ 2013-05-28 17:17       ` Suravee Suthikulanit
  0 siblings, 0 replies; 30+ messages in thread
From: Suravee Suthikulanit @ 2013-05-28 17:17 UTC (permalink / raw)
  To: Joerg Roedel; +Cc: linux-kernel, a.p.zijlstra, mingo, iommu

On 5/28/2013 7:18 AM, Joerg Roedel wrote:
> On Fri, May 17, 2013 at 02:43:32PM -0500, Suthikulpanit, Suravee wrote:
>> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
>> index b0684e4..fcbd3b8 100644
>> --- a/arch/x86/kernel/cpu/Makefile
>> +++ b/arch/x86/kernel/cpu/Makefile
>> @@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
>>   obj-$(CONFIG_PERF_EVENTS)		+= perf_event.o
>>   
>>   ifdef CONFIG_PERF_EVENTS
>> -obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o
>> +obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o
> This should also depend on CONFIG_AMD_IOMMU. If no IOMMU driver is
> compiled in it doesn't make sense to have that PMU.
I will take care of this.
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	/* integrate with iommu base devid (0000), assume one iommu */
> +	perf_iommu->max_banks =
> +		amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
> +	perf_iommu->max_counters =
> +		amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
> +
> +	/* update the hw_perf_event struct with the iommu config data */
> +	hwc->config = config;
> +	hwc->extra_reg.config = config1;
> +
> +	return 0;
> +}
> That implementation is very basic. Any reason for not using the event
> reporting mechanism of the IOMMU? You could implement a nice perf
> iommutop or something to see which devices do the most transactions or
> something like that.
>
>
> 	Joerg
This patch is adding perf system-wide counting mode support which is 
used by "perf stat" tool.  We are not implementing the sampling mode 
since MSI interrupt of the IOMMU cannot be used for current perf 
sampling tools (e.g. perf record or top) since the IOMMU counters are 
not core-specific.   The current "perf record" and "perf top" needs to 
attribute each sample to a particular core/pid which would allow the 
tools to figure out the instruction pointer and map the sample to a 
paticular module.

If I understand correctly, when you mentioned "perf iommutop", you want 
a new perf user-space tool which will show real-time IOMMU events per 
IOMMU HW and/or device?

Suravee


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
@ 2013-05-28 17:17       ` Suravee Suthikulanit
  0 siblings, 0 replies; 30+ messages in thread
From: Suravee Suthikulanit @ 2013-05-28 17:17 UTC (permalink / raw)
  To: Joerg Roedel
  Cc: iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA,
	mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	a.p.zijlstra-/NLkJaSkS4VmR6Xm/wNWPw

On 5/28/2013 7:18 AM, Joerg Roedel wrote:
> On Fri, May 17, 2013 at 02:43:32PM -0500, Suthikulpanit, Suravee wrote:
>> diff --git a/arch/x86/kernel/cpu/Makefile b/arch/x86/kernel/cpu/Makefile
>> index b0684e4..fcbd3b8 100644
>> --- a/arch/x86/kernel/cpu/Makefile
>> +++ b/arch/x86/kernel/cpu/Makefile
>> @@ -30,7 +30,7 @@ obj-$(CONFIG_CPU_SUP_UMC_32)		+= umc.o
>>   obj-$(CONFIG_PERF_EVENTS)		+= perf_event.o
>>   
>>   ifdef CONFIG_PERF_EVENTS
>> -obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o
>> +obj-$(CONFIG_CPU_SUP_AMD)		+= perf_event_amd.o perf_event_amd_uncore.o perf_event_amd_iommu.o
> This should also depend on CONFIG_AMD_IOMMU. If no IOMMU driver is
> compiled in it doesn't make sense to have that PMU.
I will take care of this.
> +	} else {
> +		return -EINVAL;
> +	}
> +
> +	/* integrate with iommu base devid (0000), assume one iommu */
> +	perf_iommu->max_banks =
> +		amd_iommu_pc_get_max_banks(IOMMU_BASE_DEVID);
> +	perf_iommu->max_counters =
> +		amd_iommu_pc_get_max_counters(IOMMU_BASE_DEVID);
> +
> +	/* update the hw_perf_event struct with the iommu config data */
> +	hwc->config = config;
> +	hwc->extra_reg.config = config1;
> +
> +	return 0;
> +}
> That implementation is very basic. Any reason for not using the event
> reporting mechanism of the IOMMU? You could implement a nice perf
> iommutop or something to see which devices do the most transactions or
> something like that.
>
>
> 	Joerg
This patch is adding perf system-wide counting mode support which is 
used by "perf stat" tool.  We are not implementing the sampling mode 
since MSI interrupt of the IOMMU cannot be used for current perf 
sampling tools (e.g. perf record or top) since the IOMMU counters are 
not core-specific.   The current "perf record" and "perf top" needs to 
attribute each sample to a particular core/pid which would allow the 
tools to figure out the instruction pointer and map the sample to a 
paticular module.

If I understand correctly, when you mentioned "perf iommutop", you want 
a new perf user-space tool which will show real-time IOMMU events per 
IOMMU HW and/or device?

Suravee

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
@ 2013-05-29 11:24         ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2013-05-29 11:24 UTC (permalink / raw)
  To: Suravee Suthikulanit; +Cc: Joerg Roedel, linux-kernel, mingo, iommu

On Tue, May 28, 2013 at 12:17:28PM -0500, Suravee Suthikulanit wrote:
> On 5/28/2013 7:18 AM, Joerg Roedel wrote:

> >That implementation is very basic. Any reason for not using the event
> >reporting mechanism of the IOMMU? You could implement a nice perf
> >iommutop or something to see which devices do the most transactions or
> >something like that.

> This patch is adding perf system-wide counting mode support which is used by
> "perf stat" tool.  We are not implementing the sampling mode since MSI
> interrupt of the IOMMU cannot be used for current perf sampling tools (e.g.
> perf record or top) since the IOMMU counters are not core-specific.   The
> current "perf record" and "perf top" needs to attribute each sample to a
> particular core/pid which would allow the tools to figure out the
> instruction pointer and map the sample to a paticular module.
> 
> If I understand correctly, when you mentioned "perf iommutop", you want a
> new perf user-space tool which will show real-time IOMMU events per IOMMU HW
> and/or device?

Right, unless there's more to the IOMMU event reporting than setting an
event threshold to get an interrupt of kinds I don't see how an
interrupt would be useful except for making sure we don't loose a
counter overflow.

Note that for Intel uncore we poll with a software timer to ensure we
don't miss the overflow because its interrupt facility is either broken
or missing.

If otoh the event reporting thing includes more data than just 'hey
counter overflow!' it might be useful. Not exactly sure how yet because
it would be the first PMU to need this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation
@ 2013-05-29 11:24         ` Peter Zijlstra
  0 siblings, 0 replies; 30+ messages in thread
From: Peter Zijlstra @ 2013-05-29 11:24 UTC (permalink / raw)
  To: Suravee Suthikulanit
  Cc: mingo-H+wXaHxf7aLQT0dZR+AlfA,
	linux-kernel-u79uwXL29TY76Z2rM5mHXA,
	iommu-cunTk1MwBs9QetFLy7KEm3xJsTq8ys+cHZ5vskTnxNA

On Tue, May 28, 2013 at 12:17:28PM -0500, Suravee Suthikulanit wrote:
> On 5/28/2013 7:18 AM, Joerg Roedel wrote:

> >That implementation is very basic. Any reason for not using the event
> >reporting mechanism of the IOMMU? You could implement a nice perf
> >iommutop or something to see which devices do the most transactions or
> >something like that.

> This patch is adding perf system-wide counting mode support which is used by
> "perf stat" tool.  We are not implementing the sampling mode since MSI
> interrupt of the IOMMU cannot be used for current perf sampling tools (e.g.
> perf record or top) since the IOMMU counters are not core-specific.   The
> current "perf record" and "perf top" needs to attribute each sample to a
> particular core/pid which would allow the tools to figure out the
> instruction pointer and map the sample to a paticular module.
> 
> If I understand correctly, when you mentioned "perf iommutop", you want a
> new perf user-space tool which will show real-time IOMMU events per IOMMU HW
> and/or device?

Right, unless there's more to the IOMMU event reporting than setting an
event threshold to get an interrupt of kinds I don't see how an
interrupt would be useful except for making sure we don't loose a
counter overflow.

Note that for Intel uncore we poll with a software timer to ensure we
don't miss the overflow because its interrupt facility is either broken
or missing.

If otoh the event reporting thing includes more data than just 'hey
counter overflow!' it might be useful. Not exactly sure how yet because
it would be the first PMU to need this.

^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2013-05-29 11:24 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2013-05-17 19:43 [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support suravee.suthikulpanit
2013-05-17 19:43 ` suravee.suthikulpanit-5C7GfCeVMHo
2013-05-17 19:43 ` [PATCH 1/2 V3] perf/x86/amd: Adding IOMMU PC resource management suravee.suthikulpanit
2013-05-17 19:43   ` suravee.suthikulpanit-5C7GfCeVMHo
2013-05-28 11:07   ` Joerg Roedel
2013-05-28 11:07     ` Joerg Roedel
2013-05-28 16:28     ` Suravee Suthikulanit
2013-05-28 16:28       ` Suravee Suthikulanit
2013-05-17 19:43 ` [PATCH 2/2 V3] perf/x86/amd: AMD IOMMU PC PERF uncore PMU implementation suravee.suthikulpanit
2013-05-17 19:43   ` suravee.suthikulpanit-5C7GfCeVMHo
2013-05-28 12:18   ` Joerg Roedel
2013-05-28 12:18     ` Joerg Roedel
2013-05-28 17:17     ` Suravee Suthikulanit
2013-05-28 17:17       ` Suravee Suthikulanit
2013-05-29 11:24       ` Peter Zijlstra
2013-05-29 11:24         ` Peter Zijlstra
2013-05-20 15:41 ` [PATCH 0/2 V3] perf/x86/amd: IOMMU Performance Counter Support Suravee Suthikulanit
2013-05-20 15:41   ` Suravee Suthikulanit
2013-05-21  9:11   ` Peter Zijlstra
2013-05-21 13:29     ` Suravee Suthikulanit
2013-05-21 13:29       ` Suravee Suthikulanit
2013-05-21 13:52       ` Peter Zijlstra
2013-05-21 13:52         ` Peter Zijlstra
2013-05-21 14:25         ` Joerg Roedel
2013-05-21 14:25           ` Joerg Roedel
2013-05-21 15:36           ` Peter Zijlstra
2013-05-21 15:36             ` Peter Zijlstra
2013-05-27 16:30           ` Peter Zijlstra
2013-05-27 17:26             ` Joerg Roedel
2013-05-27 17:26               ` Joerg Roedel

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.