All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH V2 0/5] Uncore PMON discovery mechanism support
@ 2021-03-17 17:59 kan.liang
  2021-03-17 17:59 ` [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables kan.liang
                   ` (5 more replies)
  0 siblings, 6 replies; 22+ messages in thread
From: kan.liang @ 2021-03-17 17:59 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, jolsa, eranian, namhyung, ak, Kan Liang

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain; charset=y, Size: 3497 bytes --]

From: Kan Liang <kan.liang@linux.intel.com>

Changes since V1:
- Use the generic rbtree functions, rb_add() and rb_find(). (Patch 1)
- Add a module parameter, uncore_no_discover. If users don't want the
  discovery feature, they can set uncore_no_discover=true. (Patch 1)


A mechanism of self-describing HW for the uncore PMOM has been
introduced with the latest Intel platforms. By reading through an MMIO
page worth of information, SW can ‘discover’ all the standard uncore
PMON registers.

With the discovery mechanism, Perf can
- Retrieve the generic uncore unit information of all standard uncore
  blocks, e.g., the address of counters, the address of the counter
  control, the counter width, the access type, etc.
  Perf can provide basic uncore support based on this information.
  For a new platform, perf users will get basic uncore support even if
  the platform-specific enabling code is not ready yet.
- Retrieve accurate uncore unit information, e.g., the number of uncore
  boxes. The number of uncore boxes may be different among machines.
  Current perf hard code the max number of the uncore blocks. On some
  machines, perf may create a PMU for an unavailable uncore block.
  Although there is no harm (always return 0 for the unavailable uncore
  block), it may confuse the users. The discovery mechanism can provide
  the accurate number of available uncore boxes on a machine.

But, the discovery mechanism has some limits,
- Rely on BIOS's support. If a BIOS doesn't support the discovery
  mechanism, the uncore driver will exit with -ENODEV. There is nothing
  changed.
- Only provide the generic uncore unit information. The information for
  the advanced features, such as fixed counters, filters, and
  constraints, cannot be retrieved.
- Only support the standard PMON blocks. Non-standard PMON blocks, e.g.,
  free-running counters, are not supported.
- Only provide an ID for an uncore block. No meaningful name is
  provided. The uncore_type_&typeID_&boxID will be used as the name.
- Enabling the PCI and MMIO type of uncore blocks rely on the NUMA support.
  These uncore blocks require the mapping information from a BUS to a
  die. The current discovery table doesn't provide the mapping
  information. The pcibus_to_node() from NUMA is used to retrieve the
  information. If NUMA is not supported, some uncore blocks maybe
  unavailable.

To locate the MMIO page, SW has to find a PCI device with the unique
capability ID 0x23 and retrieve its BAR address.

The spec can be found at Snow Ridge or Ice Lake server's uncore document.
https://cdrdv2.intel.com/v1/dl/getContent/611319

Kan Liang (5):
  perf/x86/intel/uncore: Parse uncore discovery tables
  perf/x86/intel/uncore: Generic support for the MSR type of uncore
    blocks
  perf/x86/intel/uncore: Rename uncore_notifier to
    uncore_pci_sub_notifier
  perf/x86/intel/uncore: Generic support for the PCI type of uncore
    blocks
  perf/x86/intel/uncore: Generic support for the MMIO type of uncore
    blocks

 arch/x86/events/intel/Makefile           |   2 +-
 arch/x86/events/intel/uncore.c           | 188 ++++++++--
 arch/x86/events/intel/uncore.h           |  10 +-
 arch/x86/events/intel/uncore_discovery.c | 622 +++++++++++++++++++++++++++++++
 arch/x86/events/intel/uncore_discovery.h | 131 +++++++
 5 files changed, 922 insertions(+), 31 deletions(-)
 create mode 100644 arch/x86/events/intel/uncore_discovery.c
 create mode 100644 arch/x86/events/intel/uncore_discovery.h

-- 
2.7.4


^ permalink raw reply	[flat|nested] 22+ messages in thread

* [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2021-03-17 17:59 [PATCH V2 0/5] Uncore PMON discovery mechanism support kan.liang
@ 2021-03-17 17:59 ` kan.liang
  2021-03-19  1:10   ` Namhyung Kim
                     ` (2 more replies)
  2021-03-17 17:59 ` [PATCH V2 2/5] perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks kan.liang
                   ` (4 subsequent siblings)
  5 siblings, 3 replies; 22+ messages in thread
From: kan.liang @ 2021-03-17 17:59 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, jolsa, eranian, namhyung, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

A self-describing mechanism for the uncore PerfMon hardware has been
introduced with the latest Intel platforms. By reading through an MMIO
page worth of information, perf can 'discover' all the standard uncore
PerfMon registers in a machine.

The discovery mechanism relies on BIOS's support. With a proper BIOS,
a PCI device with the unique capability ID 0x23 can be found on each
die. Perf can retrieve the information of all available uncore PerfMons
from the device via MMIO. The information is composed of one global
discovery table and several unit discovery tables.
- The global discovery table includes global uncore information of the
  die, e.g., the address of the global control register, the offset of
  the global status register, the number of uncore units, the offset of
  unit discovery tables, etc.
- The unit discovery table includes generic uncore unit information,
  e.g., the access type, the counter width, the address of counters,
  the address of the counter control, the unit ID, the unit type, etc.
  The unit is also called "box" in the code.
Perf can provide basic uncore support based on this information
with the following patches.

To locate the PCI device with the discovery tables, check the generic
PCI ID first. If it doesn't match, go through the entire PCI device tree
and locate the device with the unique capability ID.

The uncore information is similar among dies. To save parsing time and
space, only completely parse and store the discovery tables on the first
die and the first box of each die. The parsed information is stored in
an
RB tree structure, intel_uncore_discovery_type. The size of the stored
discovery tables varies among platforms. It's around 4KB for a Sapphire
Rapids server.

If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
will exit with -ENODEV. There is nothing changed.

Add a module parameter to disable the discovery feature. If a BIOS gets
the discovery tables wrong, users can have an option to disable the
feature. For the current patchset, the uncore driver will exit with
-ENODEV. In the future, it may fall back to the hardcode uncore driver
on a known platform.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/Makefile           |   2 +-
 arch/x86/events/intel/uncore.c           |  31 ++-
 arch/x86/events/intel/uncore_discovery.c | 318 +++++++++++++++++++++++++++++++
 arch/x86/events/intel/uncore_discovery.h | 105 ++++++++++
 4 files changed, 448 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/events/intel/uncore_discovery.c
 create mode 100644 arch/x86/events/intel/uncore_discovery.h

diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
index e67a588..10bde6c 100644
--- a/arch/x86/events/intel/Makefile
+++ b/arch/x86/events/intel/Makefile
@@ -3,6 +3,6 @@ obj-$(CONFIG_CPU_SUP_INTEL)		+= core.o bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= ds.o knc.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= lbr.o p4.o p6.o pt.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= intel-uncore.o
-intel-uncore-objs			:= uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o
+intel-uncore-objs			:= uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o uncore_discovery.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE)	+= intel-cstate.o
 intel-cstate-objs			:= cstate.o
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 33c8180..d111370 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -4,7 +4,12 @@
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
 #include "uncore.h"
+#include "uncore_discovery.h"
 
+static bool uncore_no_discover;
+module_param(uncore_no_discover, bool, 0);
+MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon discovery mechanism "
+				     "(default: enable the discovery mechanism).");
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
 struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
 struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
@@ -1637,6 +1642,9 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 	.mmio_init = snr_uncore_mmio_init,
 };
 
+static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
+};
+
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
 	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_EP,		&nhm_uncore_init),
 	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM,		&nhm_uncore_init),
@@ -1684,17 +1692,21 @@ static int __init intel_uncore_init(void)
 	struct intel_uncore_init_fun *uncore_init;
 	int pret = 0, cret = 0, mret = 0, ret;
 
-	id = x86_match_cpu(intel_uncore_match);
-	if (!id)
-		return -ENODEV;
-
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
 		return -ENODEV;
 
 	__uncore_max_dies =
 		topology_max_packages() * topology_max_die_per_package();
 
-	uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
+	id = x86_match_cpu(intel_uncore_match);
+	if (!id) {
+		if (!uncore_no_discover && intel_uncore_has_discovery_tables())
+			uncore_init = (struct intel_uncore_init_fun *)&generic_uncore_init;
+		else
+			return -ENODEV;
+	} else
+		uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
+
 	if (uncore_init->pci_init) {
 		pret = uncore_init->pci_init();
 		if (!pret)
@@ -1711,8 +1723,10 @@ static int __init intel_uncore_init(void)
 		mret = uncore_mmio_init();
 	}
 
-	if (cret && pret && mret)
-		return -ENODEV;
+	if (cret && pret && mret) {
+		ret = -ENODEV;
+		goto free_discovery;
+	}
 
 	/* Install hotplug callbacks to setup the targets for each package */
 	ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_UNCORE_ONLINE,
@@ -1727,6 +1741,8 @@ static int __init intel_uncore_init(void)
 	uncore_types_exit(uncore_msr_uncores);
 	uncore_types_exit(uncore_mmio_uncores);
 	uncore_pci_exit();
+free_discovery:
+	intel_uncore_clear_discovery_tables();
 	return ret;
 }
 module_init(intel_uncore_init);
@@ -1737,5 +1753,6 @@ static void __exit intel_uncore_exit(void)
 	uncore_types_exit(uncore_msr_uncores);
 	uncore_types_exit(uncore_mmio_uncores);
 	uncore_pci_exit();
+	intel_uncore_clear_discovery_tables();
 }
 module_exit(intel_uncore_exit);
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
new file mode 100644
index 0000000..9d5c8b2
--- /dev/null
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -0,0 +1,318 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Support Intel uncore PerfMon discovery mechanism.
+ * Copyright(c) 2021 Intel Corporation.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "uncore.h"
+#include "uncore_discovery.h"
+
+static struct rb_root discovery_tables = RB_ROOT;
+static int num_discovered_types[UNCORE_ACCESS_MAX];
+
+static bool has_generic_discovery_table(void)
+{
+	struct pci_dev *dev;
+	int dvsec;
+
+	dev = pci_get_device(PCI_VENDOR_ID_INTEL, UNCORE_DISCOVERY_TABLE_DEVICE, NULL);
+	if (!dev)
+		return false;
+
+	/* A discovery table device has the unique capability ID. */
+	dvsec = pci_find_next_ext_capability(dev, 0, UNCORE_EXT_CAP_ID_DISCOVERY);
+	pci_dev_put(dev);
+	if (dvsec)
+		return true;
+
+	return false;
+}
+
+static int logical_die_id;
+
+static int get_device_die_id(struct pci_dev *dev)
+{
+	int cpu, node = pcibus_to_node(dev->bus);
+
+	/*
+	 * If the NUMA info is not available, assume that the logical die id is
+	 * continuous in the order in which the discovery table devices are
+	 * detected.
+	 */
+	if (node < 0)
+		return logical_die_id++;
+
+	for_each_cpu(cpu, cpumask_of_node(node)) {
+		struct cpuinfo_x86 *c = &cpu_data(cpu);
+
+		if (c->initialized && cpu_to_node(cpu) == node)
+			return c->logical_die_id;
+	}
+
+	/*
+	 * All CPUs of a node may be offlined. For this case,
+	 * the PCI and MMIO type of uncore blocks which are
+	 * enumerated by the device will be unavailable.
+	 */
+	return -1;
+}
+
+#define __node_2_type(cur)	\
+	rb_entry((cur), struct intel_uncore_discovery_type, node)
+
+static inline int __type_cmp(const void *key, const struct rb_node *b)
+{
+	struct intel_uncore_discovery_type *type_b = __node_2_type(b);
+	const u16 *type_id = key;
+
+	if (type_b->type > *type_id)
+		return -1;
+	else if (type_b->type < *type_id)
+		return 1;
+
+	return 0;
+}
+
+static inline struct intel_uncore_discovery_type *
+search_uncore_discovery_type(u16 type_id)
+{
+	struct rb_node *node = rb_find(&type_id, &discovery_tables, __type_cmp);
+
+	return (node) ? __node_2_type(node) : NULL;
+}
+
+static inline bool __type_less(struct rb_node *a, const struct rb_node *b)
+{
+	return (__node_2_type(a)->type < __node_2_type(b)->type) ? true : false;
+}
+
+static struct intel_uncore_discovery_type *
+add_uncore_discovery_type(struct uncore_unit_discovery *unit)
+{
+	struct intel_uncore_discovery_type *type;
+
+	if (unit->access_type >= UNCORE_ACCESS_MAX) {
+		pr_warn("Unsupported access type %d\n", unit->access_type);
+		return NULL;
+	}
+
+	type = kzalloc(sizeof(struct intel_uncore_discovery_type), GFP_KERNEL);
+	if (!type)
+		return NULL;
+
+	type->box_ctrl_die = kcalloc(__uncore_max_dies, sizeof(u64), GFP_KERNEL);
+	if (!type->box_ctrl_die)
+		goto free_type;
+
+	type->access_type = unit->access_type;
+	num_discovered_types[type->access_type]++;
+	type->type = unit->box_type;
+
+	rb_add(&type->node, &discovery_tables, __type_less);
+
+	return type;
+
+free_type:
+	kfree(type);
+
+	return NULL;
+
+}
+
+static struct intel_uncore_discovery_type *
+get_uncore_discovery_type(struct uncore_unit_discovery *unit)
+{
+	struct intel_uncore_discovery_type *type;
+
+	type = search_uncore_discovery_type(unit->box_type);
+	if (type)
+		return type;
+
+	return add_uncore_discovery_type(unit);
+}
+
+static void
+uncore_insert_box_info(struct uncore_unit_discovery *unit,
+		       int die, bool parsed)
+{
+	struct intel_uncore_discovery_type *type;
+	unsigned int *box_offset, *ids;
+	int i;
+
+	if (WARN_ON_ONCE(!unit->ctl || !unit->ctl_offset || !unit->ctr_offset))
+		return;
+
+	if (parsed) {
+		type = search_uncore_discovery_type(unit->box_type);
+		if (WARN_ON_ONCE(!type))
+			return;
+		/* Store the first box of each die */
+		if (!type->box_ctrl_die[die])
+			type->box_ctrl_die[die] = unit->ctl;
+		return;
+	}
+
+	type = get_uncore_discovery_type(unit);
+	if (!type)
+		return;
+
+	box_offset = kcalloc(type->num_boxes + 1, sizeof(unsigned int), GFP_KERNEL);
+	if (!box_offset)
+		return;
+
+	ids = kcalloc(type->num_boxes + 1, sizeof(unsigned int), GFP_KERNEL);
+	if (!ids)
+		goto free_box_offset;
+
+	/* Store generic information for the first box */
+	if (!type->num_boxes) {
+		type->box_ctrl = unit->ctl;
+		type->box_ctrl_die[die] = unit->ctl;
+		type->num_counters = unit->num_regs;
+		type->counter_width = unit->bit_width;
+		type->ctl_offset = unit->ctl_offset;
+		type->ctr_offset = unit->ctr_offset;
+		*ids = unit->box_id;
+		goto end;
+	}
+
+	for (i = 0; i < type->num_boxes; i++) {
+		ids[i] = type->ids[i];
+		box_offset[i] = type->box_offset[i];
+
+		if (WARN_ON_ONCE(unit->box_id == ids[i]))
+			goto free_ids;
+	}
+	ids[i] = unit->box_id;
+	box_offset[i] = unit->ctl - type->box_ctrl;
+	kfree(type->ids);
+	kfree(type->box_offset);
+end:
+	type->ids = ids;
+	type->box_offset = box_offset;
+	type->num_boxes++;
+	return;
+
+free_ids:
+	kfree(ids);
+
+free_box_offset:
+	kfree(box_offset);
+
+}
+
+static int parse_discovery_table(struct pci_dev *dev, int die,
+				 u32 bar_offset, bool *parsed)
+{
+	struct uncore_global_discovery global;
+	struct uncore_unit_discovery unit;
+	void __iomem *io_addr;
+	resource_size_t addr;
+	unsigned long size;
+	u32 val;
+	int i;
+
+	pci_read_config_dword(dev, bar_offset, &val);
+
+	if (val & UNCORE_DISCOVERY_MASK)
+		return -EINVAL;
+
+	addr = (resource_size_t)(val & ~UNCORE_DISCOVERY_MASK);
+	size = UNCORE_DISCOVERY_GLOBAL_MAP_SIZE;
+	io_addr = ioremap(addr, size);
+	if (!io_addr)
+		return -ENOMEM;
+
+	/* Read Global Discovery State */
+	memcpy_fromio(&global, io_addr, sizeof(struct uncore_global_discovery));
+	if (uncore_discovery_invalid_unit(global)) {
+		pr_info("Invalid Global Discovery State: 0x%llx 0x%llx 0x%llx\n",
+			global.table1, global.ctl, global.table3);
+		iounmap(io_addr);
+		return -EINVAL;
+	}
+	iounmap(io_addr);
+
+	size = (1 + global.max_units) * global.stride * 8;
+	io_addr = ioremap(addr, size);
+	if (!io_addr)
+		return -ENOMEM;
+
+	/* Parsing Unit Discovery State */
+	for (i = 0; i < global.max_units; i++) {
+		memcpy_fromio(&unit, io_addr + (i + 1) * (global.stride * 8),
+			      sizeof(struct uncore_unit_discovery));
+
+		if (uncore_discovery_invalid_unit(unit))
+			continue;
+
+		if (unit.access_type >= UNCORE_ACCESS_MAX)
+			continue;
+
+		uncore_insert_box_info(&unit, die, *parsed);
+	}
+
+	*parsed = true;
+	iounmap(io_addr);
+	return 0;
+}
+
+bool intel_uncore_has_discovery_tables(void)
+{
+	u32 device, val, entry_id, bar_offset;
+	int die, dvsec = 0, ret = true;
+	struct pci_dev *dev = NULL;
+	bool parsed = false;
+
+	if (has_generic_discovery_table())
+		device = UNCORE_DISCOVERY_TABLE_DEVICE;
+	else
+		device = PCI_ANY_ID;
+
+	/*
+	 * Start a new search and iterates through the list of
+	 * the discovery table devices.
+	 */
+	while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) {
+		while ((dvsec = pci_find_next_ext_capability(dev, dvsec, UNCORE_EXT_CAP_ID_DISCOVERY))) {
+			pci_read_config_dword(dev, dvsec + UNCORE_DISCOVERY_DVSEC_OFFSET, &val);
+			entry_id = val & UNCORE_DISCOVERY_DVSEC_ID_MASK;
+			if (entry_id != UNCORE_DISCOVERY_DVSEC_ID_PMON)
+				continue;
+
+			pci_read_config_dword(dev, dvsec + UNCORE_DISCOVERY_DVSEC2_OFFSET, &val);
+
+			if (val & ~UNCORE_DISCOVERY_DVSEC2_BIR_MASK) {
+				ret = false;
+				goto err;
+			}
+			bar_offset = UNCORE_DISCOVERY_BIR_BASE +
+				     (val & UNCORE_DISCOVERY_DVSEC2_BIR_MASK) * UNCORE_DISCOVERY_BIR_STEP;
+
+			die = get_device_die_id(dev);
+			if (die < 0)
+				continue;
+
+			parse_discovery_table(dev, die, bar_offset, &parsed);
+		}
+	}
+
+	/* None of the discovery tables are available */
+	if (!parsed)
+		ret = false;
+err:
+	pci_dev_put(dev);
+
+	return ret;
+}
+
+void intel_uncore_clear_discovery_tables(void)
+{
+	struct intel_uncore_discovery_type *type, *next;
+
+	rbtree_postorder_for_each_entry_safe(type, next, &discovery_tables, node) {
+		kfree(type->box_ctrl_die);
+		kfree(type);
+	}
+}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
new file mode 100644
index 0000000..95afa39
--- /dev/null
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/* Generic device ID of a discovery table device */
+#define UNCORE_DISCOVERY_TABLE_DEVICE		0x09a7
+/* Capability ID for a discovery table device */
+#define UNCORE_EXT_CAP_ID_DISCOVERY		0x23
+/* First DVSEC offset */
+#define UNCORE_DISCOVERY_DVSEC_OFFSET		0x8
+/* Mask of the supported discovery entry type */
+#define UNCORE_DISCOVERY_DVSEC_ID_MASK		0xffff
+/* PMON discovery entry type ID */
+#define UNCORE_DISCOVERY_DVSEC_ID_PMON		0x1
+/* Second DVSEC offset */
+#define UNCORE_DISCOVERY_DVSEC2_OFFSET		0xc
+/* Mask of the discovery table BAR offset */
+#define UNCORE_DISCOVERY_DVSEC2_BIR_MASK	0x7
+/* Discovery table BAR base offset */
+#define UNCORE_DISCOVERY_BIR_BASE		0x10
+/* Discovery table BAR step */
+#define UNCORE_DISCOVERY_BIR_STEP		0x4
+/* Mask of the discovery table offset */
+#define UNCORE_DISCOVERY_MASK			0xf
+/* Global discovery table size */
+#define UNCORE_DISCOVERY_GLOBAL_MAP_SIZE	0x20
+
+#define uncore_discovery_invalid_unit(unit)			\
+	(!unit.table1 || !unit.ctl || !unit.table3 ||	\
+	 unit.table1 == -1ULL || unit.ctl == -1ULL ||	\
+	 unit.table3 == -1ULL)
+
+enum uncore_access_type {
+	UNCORE_ACCESS_MSR	= 0,
+	UNCORE_ACCESS_MMIO,
+	UNCORE_ACCESS_PCI,
+
+	UNCORE_ACCESS_MAX,
+};
+
+struct uncore_global_discovery {
+	union {
+		u64	table1;
+		struct {
+			u64	type : 8,
+				stride : 8,
+				max_units : 10,
+				__reserved_1 : 36,
+				access_type : 2;
+		};
+	};
+
+	u64	ctl;		/* Global Control Address */
+
+	union {
+		u64	table3;
+		struct {
+			u64	status_offset : 8,
+				num_status : 16,
+				__reserved_2 : 40;
+		};
+	};
+};
+
+struct uncore_unit_discovery {
+	union {
+		u64	table1;
+		struct {
+			u64	num_regs : 8,
+				ctl_offset : 8,
+				bit_width : 8,
+				ctr_offset : 8,
+				status_offset : 8,
+				__reserved_1 : 22,
+				access_type : 2;
+			};
+		};
+
+	u64	ctl;		/* Unit Control Address */
+
+	union {
+		u64	table3;
+		struct {
+			u64	box_type : 16,
+				box_id : 16,
+				__reserved_2 : 32;
+		};
+	};
+};
+
+struct intel_uncore_discovery_type {
+	struct rb_node	node;
+	enum uncore_access_type	access_type;
+	u64		box_ctrl;	/* Unit ctrl addr of the first box */
+	u64		*box_ctrl_die;	/* Unit ctrl addr of the first box of each die */
+	u16		type;		/* Type ID of the uncore block */
+	u8		num_counters;
+	u8		counter_width;
+	u8		ctl_offset;	/* Counter Control 0 offset */
+	u8		ctr_offset;	/* Counter 0 offset */
+	u16		num_boxes;	/* number of boxes for the uncore block */
+	unsigned int	*ids;		/* Box IDs */
+	unsigned int	*box_offset;	/* Box offset */
+};
+
+bool intel_uncore_has_discovery_tables(void);
+void intel_uncore_clear_discovery_tables(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 2/5] perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks
  2021-03-17 17:59 [PATCH V2 0/5] Uncore PMON discovery mechanism support kan.liang
  2021-03-17 17:59 ` [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables kan.liang
@ 2021-03-17 17:59 ` kan.liang
  2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2021-03-17 17:59 ` [PATCH V2 3/5] perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier kan.liang
                   ` (3 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: kan.liang @ 2021-03-17 17:59 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, jolsa, eranian, namhyung, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The discovery table provides the generic uncore block information for
the MSR type of uncore blocks, e.g., the counter width, the number of
counters, the location of control/counter registers, which is good
enough to provide basic uncore support. It can be used as a fallback
solution when the kernel doesn't support a platform.

The name of the uncore box cannot be retrieved from the discovery table.
uncore_type_&typeID_&boxID will be used as its name. Save the type ID
and the box ID information in the struct intel_uncore_type.
Factor out uncore_get_pmu_name() to handle different naming methods.

Implement generic support for the MSR type of uncore block.

Some advanced features, such as filters and constraints, cannot be
retrieved from discovery tables. Features that rely on that
information are not be supported here.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore.c           |  45 ++++++++---
 arch/x86/events/intel/uncore.h           |   3 +
 arch/x86/events/intel/uncore_discovery.c | 126 +++++++++++++++++++++++++++++++
 arch/x86/events/intel/uncore_discovery.h |  18 +++++
 4 files changed, 182 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index d111370..dabc01f 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -10,7 +10,7 @@ static bool uncore_no_discover;
 module_param(uncore_no_discover, bool, 0);
 MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon discovery mechanism "
 				     "(default: enable the discovery mechanism).");
-static struct intel_uncore_type *empty_uncore[] = { NULL, };
+struct intel_uncore_type *empty_uncore[] = { NULL, };
 struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
 struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
 struct intel_uncore_type **uncore_mmio_uncores = empty_uncore;
@@ -834,6 +834,34 @@ static const struct attribute_group uncore_pmu_attr_group = {
 	.attrs = uncore_pmu_attrs,
 };
 
+static void uncore_get_pmu_name(struct intel_uncore_pmu *pmu)
+{
+	struct intel_uncore_type *type = pmu->type;
+
+	/*
+	 * No uncore block name in discovery table.
+	 * Use uncore_type_&typeid_&boxid as name.
+	 */
+	if (!type->name) {
+		if (type->num_boxes == 1)
+			sprintf(pmu->name, "uncore_type_%u", type->type_id);
+		else {
+			sprintf(pmu->name, "uncore_type_%u_%d",
+				type->type_id, type->box_ids[pmu->pmu_idx]);
+		}
+		return;
+	}
+
+	if (type->num_boxes == 1) {
+		if (strlen(type->name) > 0)
+			sprintf(pmu->name, "uncore_%s", type->name);
+		else
+			sprintf(pmu->name, "uncore");
+	} else
+		sprintf(pmu->name, "uncore_%s_%d", type->name, pmu->pmu_idx);
+
+}
+
 static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
 {
 	int ret;
@@ -860,15 +888,7 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
 		pmu->pmu.attr_update = pmu->type->attr_update;
 	}
 
-	if (pmu->type->num_boxes == 1) {
-		if (strlen(pmu->type->name) > 0)
-			sprintf(pmu->name, "uncore_%s", pmu->type->name);
-		else
-			sprintf(pmu->name, "uncore");
-	} else {
-		sprintf(pmu->name, "uncore_%s_%d", pmu->type->name,
-			pmu->pmu_idx);
-	}
+	uncore_get_pmu_name(pmu);
 
 	ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
 	if (!ret)
@@ -909,6 +929,10 @@ static void uncore_type_exit(struct intel_uncore_type *type)
 		kfree(type->pmus);
 		type->pmus = NULL;
 	}
+	if (type->box_ids) {
+		kfree(type->box_ids);
+		type->box_ids = NULL;
+	}
 	kfree(type->events_group);
 	type->events_group = NULL;
 }
@@ -1643,6 +1667,7 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 };
 
 static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
+	.cpu_init = intel_uncore_generic_uncore_cpu_init,
 };
 
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index a3c6e16..05c8e06 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -50,6 +50,7 @@ struct intel_uncore_type {
 	int perf_ctr_bits;
 	int fixed_ctr_bits;
 	int num_freerunning_types;
+	int type_id;
 	unsigned perf_ctr;
 	unsigned event_ctl;
 	unsigned event_mask;
@@ -66,6 +67,7 @@ struct intel_uncore_type {
 	unsigned single_fixed:1;
 	unsigned pair_ctr_ctl:1;
 	unsigned *msr_offsets;
+	unsigned *box_ids;
 	struct event_constraint unconstrainted;
 	struct event_constraint *constraints;
 	struct intel_uncore_pmu *pmus;
@@ -547,6 +549,7 @@ uncore_get_constraint(struct intel_uncore_box *box, struct perf_event *event);
 void uncore_put_constraint(struct intel_uncore_box *box, struct perf_event *event);
 u64 uncore_shared_reg_config(struct intel_uncore_box *box, int idx);
 
+extern struct intel_uncore_type *empty_uncore[];
 extern struct intel_uncore_type **uncore_msr_uncores;
 extern struct intel_uncore_type **uncore_pci_uncores;
 extern struct intel_uncore_type **uncore_mmio_uncores;
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 9d5c8b2..b27f4a9 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -316,3 +316,129 @@ void intel_uncore_clear_discovery_tables(void)
 		kfree(type);
 	}
 }
+
+DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
+DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
+DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
+DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
+DEFINE_UNCORE_FORMAT_ATTR(thresh, thresh, "config:24-31");
+
+static struct attribute *generic_uncore_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh.attr,
+	NULL,
+};
+
+static const struct attribute_group generic_uncore_format_group = {
+	.name = "format",
+	.attrs = generic_uncore_formats_attr,
+};
+
+static void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
+{
+	wrmsrl(uncore_msr_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
+}
+
+static void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box)
+{
+	wrmsrl(uncore_msr_box_ctl(box), GENERIC_PMON_BOX_CTL_FRZ);
+}
+
+static void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box)
+{
+	wrmsrl(uncore_msr_box_ctl(box), 0);
+}
+
+static void intel_generic_uncore_msr_enable_event(struct intel_uncore_box *box,
+					    struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	wrmsrl(hwc->config_base, hwc->config);
+}
+
+static void intel_generic_uncore_msr_disable_event(struct intel_uncore_box *box,
+					     struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	wrmsrl(hwc->config_base, 0);
+}
+
+static struct intel_uncore_ops generic_uncore_msr_ops = {
+	.init_box		= intel_generic_uncore_msr_init_box,
+	.disable_box		= intel_generic_uncore_msr_disable_box,
+	.enable_box		= intel_generic_uncore_msr_enable_box,
+	.disable_event		= intel_generic_uncore_msr_disable_event,
+	.enable_event		= intel_generic_uncore_msr_enable_event,
+	.read_counter		= uncore_msr_read_counter,
+};
+
+static bool uncore_update_uncore_type(enum uncore_access_type type_id,
+				      struct intel_uncore_type *uncore,
+				      struct intel_uncore_discovery_type *type)
+{
+	uncore->type_id = type->type;
+	uncore->num_boxes = type->num_boxes;
+	uncore->num_counters = type->num_counters;
+	uncore->perf_ctr_bits = type->counter_width;
+	uncore->box_ids = type->ids;
+
+	switch (type_id) {
+	case UNCORE_ACCESS_MSR:
+		uncore->ops = &generic_uncore_msr_ops;
+		uncore->perf_ctr = (unsigned int)type->box_ctrl + type->ctr_offset;
+		uncore->event_ctl = (unsigned int)type->box_ctrl + type->ctl_offset;
+		uncore->box_ctl = (unsigned int)type->box_ctrl;
+		uncore->msr_offsets = type->box_offset;
+		break;
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+static struct intel_uncore_type **
+intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
+{
+	struct intel_uncore_discovery_type *type;
+	struct intel_uncore_type **uncores;
+	struct intel_uncore_type *uncore;
+	struct rb_node *node;
+	int i = 0;
+
+	uncores = kcalloc(num_discovered_types[type_id] + 1,
+			  sizeof(struct intel_uncore_type *), GFP_KERNEL);
+	if (!uncores)
+		return empty_uncore;
+
+	for (node = rb_first(&discovery_tables); node; node = rb_next(node)) {
+		type = rb_entry(node, struct intel_uncore_discovery_type, node);
+		if (type->access_type != type_id)
+			continue;
+
+		uncore = kzalloc(sizeof(struct intel_uncore_type), GFP_KERNEL);
+		if (!uncore)
+			break;
+
+		uncore->event_mask = GENERIC_PMON_RAW_EVENT_MASK;
+		uncore->format_group = &generic_uncore_format_group;
+
+		if (!uncore_update_uncore_type(type_id, uncore, type)) {
+			kfree(uncore);
+			continue;
+		}
+		uncores[i++] = uncore;
+	}
+
+	return uncores;
+}
+
+void intel_uncore_generic_uncore_cpu_init(void)
+{
+	uncore_msr_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MSR);
+}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 95afa39..87078ba 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -28,6 +28,23 @@
 	 unit.table1 == -1ULL || unit.ctl == -1ULL ||	\
 	 unit.table3 == -1ULL)
 
+#define GENERIC_PMON_CTL_EV_SEL_MASK	0x000000ff
+#define GENERIC_PMON_CTL_UMASK_MASK	0x0000ff00
+#define GENERIC_PMON_CTL_EDGE_DET	(1 << 18)
+#define GENERIC_PMON_CTL_INVERT		(1 << 23)
+#define GENERIC_PMON_CTL_TRESH_MASK	0xff000000
+#define GENERIC_PMON_RAW_EVENT_MASK	(GENERIC_PMON_CTL_EV_SEL_MASK | \
+					 GENERIC_PMON_CTL_UMASK_MASK | \
+					 GENERIC_PMON_CTL_EDGE_DET | \
+					 GENERIC_PMON_CTL_INVERT | \
+					 GENERIC_PMON_CTL_TRESH_MASK)
+
+#define GENERIC_PMON_BOX_CTL_FRZ	(1 << 0)
+#define GENERIC_PMON_BOX_CTL_RST_CTRL	(1 << 8)
+#define GENERIC_PMON_BOX_CTL_RST_CTRS	(1 << 9)
+#define GENERIC_PMON_BOX_CTL_INT	(GENERIC_PMON_BOX_CTL_RST_CTRL | \
+					 GENERIC_PMON_BOX_CTL_RST_CTRS)
+
 enum uncore_access_type {
 	UNCORE_ACCESS_MSR	= 0,
 	UNCORE_ACCESS_MMIO,
@@ -103,3 +120,4 @@ struct intel_uncore_discovery_type {
 
 bool intel_uncore_has_discovery_tables(void);
 void intel_uncore_clear_discovery_tables(void);
+void intel_uncore_generic_uncore_cpu_init(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 3/5] perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier
  2021-03-17 17:59 [PATCH V2 0/5] Uncore PMON discovery mechanism support kan.liang
  2021-03-17 17:59 ` [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables kan.liang
  2021-03-17 17:59 ` [PATCH V2 2/5] perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks kan.liang
@ 2021-03-17 17:59 ` kan.liang
  2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2021-03-17 17:59 ` [PATCH V2 4/5] perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks kan.liang
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 22+ messages in thread
From: kan.liang @ 2021-03-17 17:59 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, jolsa, eranian, namhyung, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Perf will use a similar method to the PCI sub driver to register
the PMUs for the PCI type of uncore blocks. The method requires a BUS
notifier to support hotplug. The current BUS notifier cannot be reused,
because it searches a const id_table for the corresponding registered
PMU. The PCI type of uncore blocks in the discovery tables doesn't
provide an id_table.

Factor out uncore_bus_notify() and add the pointer of an id_table as a
parameter. The uncore_bus_notify() will be reused in the following
patch.

The current BUS notifier is only used by the PCI sub driver. Its name is
too generic. Rename it to uncore_pci_sub_notifier, which is specific for
the PCI sub driver.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index dabc01f..391fa7c 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1203,7 +1203,8 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 }
 
 static int uncore_bus_notify(struct notifier_block *nb,
-			     unsigned long action, void *data)
+			     unsigned long action, void *data,
+			     const struct pci_device_id *ids)
 {
 	struct device *dev = data;
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -1214,7 +1215,7 @@ static int uncore_bus_notify(struct notifier_block *nb,
 	if (action != BUS_NOTIFY_DEL_DEVICE)
 		return NOTIFY_DONE;
 
-	pmu = uncore_pci_find_dev_pmu(pdev, uncore_pci_sub_driver->id_table);
+	pmu = uncore_pci_find_dev_pmu(pdev, ids);
 	if (!pmu)
 		return NOTIFY_DONE;
 
@@ -1226,8 +1227,15 @@ static int uncore_bus_notify(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
-static struct notifier_block uncore_notifier = {
-	.notifier_call = uncore_bus_notify,
+static int uncore_pci_sub_bus_notify(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	return uncore_bus_notify(nb, action, data,
+				 uncore_pci_sub_driver->id_table);
+}
+
+static struct notifier_block uncore_pci_sub_notifier = {
+	.notifier_call = uncore_pci_sub_bus_notify,
 };
 
 static void uncore_pci_sub_driver_init(void)
@@ -1268,7 +1276,7 @@ static void uncore_pci_sub_driver_init(void)
 		ids++;
 	}
 
-	if (notify && bus_register_notifier(&pci_bus_type, &uncore_notifier))
+	if (notify && bus_register_notifier(&pci_bus_type, &uncore_pci_sub_notifier))
 		notify = false;
 
 	if (!notify)
@@ -1319,7 +1327,7 @@ static void uncore_pci_exit(void)
 	if (pcidrv_registered) {
 		pcidrv_registered = false;
 		if (uncore_pci_sub_driver)
-			bus_unregister_notifier(&pci_bus_type, &uncore_notifier);
+			bus_unregister_notifier(&pci_bus_type, &uncore_pci_sub_notifier);
 		pci_unregister_driver(uncore_pci_driver);
 		uncore_types_exit(uncore_pci_uncores);
 		kfree(uncore_extra_pci_dev);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 4/5] perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks
  2021-03-17 17:59 [PATCH V2 0/5] Uncore PMON discovery mechanism support kan.liang
                   ` (2 preceding siblings ...)
  2021-03-17 17:59 ` [PATCH V2 3/5] perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier kan.liang
@ 2021-03-17 17:59 ` kan.liang
  2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2021-03-17 17:59 ` [PATCH V2 5/5] perf/x86/intel/uncore: Generic support for the MMIO " kan.liang
  2022-09-20 18:25 ` [PATCH V2 0/5] Uncore PMON discovery mechanism support Kin Cho
  5 siblings, 1 reply; 22+ messages in thread
From: kan.liang @ 2021-03-17 17:59 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, jolsa, eranian, namhyung, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The discovery table provides the generic uncore block information
for the PCI type of uncore blocks, which is good enough to provide
basic uncore support.

The PCI BUS and DEVFN information can be retrieved from the box control
field. Introduce the uncore_pci_pmus_register() to register all the
PCICFG type of uncore blocks. The old PCI probe/remove way is dropped.

The PCI BUS and DEVFN information are different among dies. Add box_ctls
to store the box control field of each die.

Add a new BUS notifier for the PCI type of uncore block to support the
hotplug. If the device is "hot remove", the corresponding registered PMU
has to be unregistered. Perf cannot locate the PMU by searching a const
pci_device_id table, because the discovery tables don't provide such
information. Introduce uncore_pci_find_dev_pmu_from_types() to search
the whole uncore_pci_uncores for the PMU.

Implement generic support for the PCI type of uncore block.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore.c           | 91 +++++++++++++++++++++++++++++---
 arch/x86/events/intel/uncore.h           |  6 ++-
 arch/x86/events/intel/uncore_discovery.c | 80 ++++++++++++++++++++++++++++
 arch/x86/events/intel/uncore_discovery.h |  7 +++
 4 files changed, 177 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 391fa7c..3109082 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1032,10 +1032,37 @@ static int uncore_pci_get_dev_die_info(struct pci_dev *pdev, int *die)
 	return 0;
 }
 
+static struct intel_uncore_pmu *
+uncore_pci_find_dev_pmu_from_types(struct pci_dev *pdev)
+{
+	struct intel_uncore_type **types = uncore_pci_uncores;
+	struct intel_uncore_type *type;
+	u64 box_ctl;
+	int i, die;
+
+	for (; *types; types++) {
+		type = *types;
+		for (die = 0; die < __uncore_max_dies; die++) {
+			for (i = 0; i < type->num_boxes; i++) {
+				if (!type->box_ctls[die])
+					continue;
+				box_ctl = type->box_ctls[die] + type->pci_offsets[i];
+				if (pdev->devfn == UNCORE_DISCOVERY_PCI_DEVFN(box_ctl) &&
+				    pdev->bus->number == UNCORE_DISCOVERY_PCI_BUS(box_ctl) &&
+				    pci_domain_nr(pdev->bus) == UNCORE_DISCOVERY_PCI_DOMAIN(box_ctl))
+					return &type->pmus[i];
+			}
+		}
+	}
+
+	return NULL;
+}
+
 /*
  * Find the PMU of a PCI device.
  * @pdev: The PCI device.
  * @ids: The ID table of the available PCI devices with a PMU.
+ *       If NULL, search the whole uncore_pci_uncores.
  */
 static struct intel_uncore_pmu *
 uncore_pci_find_dev_pmu(struct pci_dev *pdev, const struct pci_device_id *ids)
@@ -1045,6 +1072,9 @@ uncore_pci_find_dev_pmu(struct pci_dev *pdev, const struct pci_device_id *ids)
 	kernel_ulong_t data;
 	unsigned int devfn;
 
+	if (!ids)
+		return uncore_pci_find_dev_pmu_from_types(pdev);
+
 	while (ids && ids->vendor) {
 		if ((ids->vendor == pdev->vendor) &&
 		    (ids->device == pdev->device)) {
@@ -1283,6 +1313,48 @@ static void uncore_pci_sub_driver_init(void)
 		uncore_pci_sub_driver = NULL;
 }
 
+static int uncore_pci_bus_notify(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	return uncore_bus_notify(nb, action, data, NULL);
+}
+
+static struct notifier_block uncore_pci_notifier = {
+	.notifier_call = uncore_pci_bus_notify,
+};
+
+
+static void uncore_pci_pmus_register(void)
+{
+	struct intel_uncore_type **types = uncore_pci_uncores;
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct pci_dev *pdev;
+	u64 box_ctl;
+	int i, die;
+
+	for (; *types; types++) {
+		type = *types;
+		for (die = 0; die < __uncore_max_dies; die++) {
+			for (i = 0; i < type->num_boxes; i++) {
+				if (!type->box_ctls[die])
+					continue;
+				box_ctl = type->box_ctls[die] + type->pci_offsets[i];
+				pdev = pci_get_domain_bus_and_slot(UNCORE_DISCOVERY_PCI_DOMAIN(box_ctl),
+								   UNCORE_DISCOVERY_PCI_BUS(box_ctl),
+								   UNCORE_DISCOVERY_PCI_DEVFN(box_ctl));
+				if (!pdev)
+					continue;
+				pmu = &type->pmus[i];
+
+				uncore_pci_pmu_register(pdev, type, pmu, die);
+			}
+		}
+	}
+
+	bus_register_notifier(&pci_bus_type, &uncore_pci_notifier);
+}
+
 static int __init uncore_pci_init(void)
 {
 	size_t size;
@@ -1299,12 +1371,15 @@ static int __init uncore_pci_init(void)
 	if (ret)
 		goto errtype;
 
-	uncore_pci_driver->probe = uncore_pci_probe;
-	uncore_pci_driver->remove = uncore_pci_remove;
+	if (uncore_pci_driver) {
+		uncore_pci_driver->probe = uncore_pci_probe;
+		uncore_pci_driver->remove = uncore_pci_remove;
 
-	ret = pci_register_driver(uncore_pci_driver);
-	if (ret)
-		goto errtype;
+		ret = pci_register_driver(uncore_pci_driver);
+		if (ret)
+			goto errtype;
+	} else
+		uncore_pci_pmus_register();
 
 	if (uncore_pci_sub_driver)
 		uncore_pci_sub_driver_init();
@@ -1328,7 +1403,10 @@ static void uncore_pci_exit(void)
 		pcidrv_registered = false;
 		if (uncore_pci_sub_driver)
 			bus_unregister_notifier(&pci_bus_type, &uncore_pci_sub_notifier);
-		pci_unregister_driver(uncore_pci_driver);
+		if (uncore_pci_driver)
+			pci_unregister_driver(uncore_pci_driver);
+		else
+			bus_unregister_notifier(&pci_bus_type, &uncore_pci_notifier);
 		uncore_types_exit(uncore_pci_uncores);
 		kfree(uncore_extra_pci_dev);
 		uncore_free_pcibus_map();
@@ -1676,6 +1754,7 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 
 static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
 	.cpu_init = intel_uncore_generic_uncore_cpu_init,
+	.pci_init = intel_uncore_generic_uncore_pci_init,
 };
 
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 05c8e06..76fc898 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -58,6 +58,7 @@ struct intel_uncore_type {
 	unsigned fixed_ctr;
 	unsigned fixed_ctl;
 	unsigned box_ctl;
+	u64 *box_ctls;	/* Unit ctrl addr of the first box of each die */
 	union {
 		unsigned msr_offset;
 		unsigned mmio_offset;
@@ -66,7 +67,10 @@ struct intel_uncore_type {
 	unsigned num_shared_regs:8;
 	unsigned single_fixed:1;
 	unsigned pair_ctr_ctl:1;
-	unsigned *msr_offsets;
+	union {
+		unsigned *msr_offsets;
+		unsigned *pci_offsets;
+	};
 	unsigned *box_ids;
 	struct event_constraint unconstrainted;
 	struct event_constraint *constraints;
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index b27f4a9..01aa2c0 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -377,6 +377,71 @@ static struct intel_uncore_ops generic_uncore_msr_ops = {
 	.read_counter		= uncore_msr_read_counter,
 };
 
+static void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+
+	__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
+	pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
+}
+
+static void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+
+	pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_FRZ);
+}
+
+static void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+
+	pci_write_config_dword(pdev, box_ctl, 0);
+}
+
+static void intel_generic_uncore_pci_enable_event(struct intel_uncore_box *box,
+					    struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base, hwc->config);
+}
+
+static void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
+					     struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base, 0);
+}
+
+static u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
+					   struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+	u64 count = 0;
+
+	pci_read_config_dword(pdev, hwc->event_base, (u32 *)&count);
+	pci_read_config_dword(pdev, hwc->event_base + 4, (u32 *)&count + 1);
+
+	return count;
+}
+
+static struct intel_uncore_ops generic_uncore_pci_ops = {
+	.init_box	= intel_generic_uncore_pci_init_box,
+	.disable_box	= intel_generic_uncore_pci_disable_box,
+	.enable_box	= intel_generic_uncore_pci_enable_box,
+	.disable_event	= intel_generic_uncore_pci_disable_event,
+	.enable_event	= intel_generic_uncore_pci_enable_event,
+	.read_counter	= intel_generic_uncore_pci_read_counter,
+};
+
 static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 				      struct intel_uncore_type *uncore,
 				      struct intel_uncore_discovery_type *type)
@@ -395,6 +460,14 @@ static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 		uncore->box_ctl = (unsigned int)type->box_ctrl;
 		uncore->msr_offsets = type->box_offset;
 		break;
+	case UNCORE_ACCESS_PCI:
+		uncore->ops = &generic_uncore_pci_ops;
+		uncore->perf_ctr = (unsigned int)UNCORE_DISCOVERY_PCI_BOX_CTRL(type->box_ctrl) + type->ctr_offset;
+		uncore->event_ctl = (unsigned int)UNCORE_DISCOVERY_PCI_BOX_CTRL(type->box_ctrl) + type->ctl_offset;
+		uncore->box_ctl = (unsigned int)UNCORE_DISCOVERY_PCI_BOX_CTRL(type->box_ctrl);
+		uncore->box_ctls = type->box_ctrl_die;
+		uncore->pci_offsets = type->box_offset;
+		break;
 	default:
 		return false;
 	}
@@ -442,3 +515,10 @@ void intel_uncore_generic_uncore_cpu_init(void)
 {
 	uncore_msr_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MSR);
 }
+
+int intel_uncore_generic_uncore_pci_init(void)
+{
+	uncore_pci_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_PCI);
+
+	return 0;
+}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 87078ba..1639ff7 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -23,6 +23,12 @@
 /* Global discovery table size */
 #define UNCORE_DISCOVERY_GLOBAL_MAP_SIZE	0x20
 
+#define UNCORE_DISCOVERY_PCI_DOMAIN(data)	((data >> 28) & 0x7)
+#define UNCORE_DISCOVERY_PCI_BUS(data)		((data >> 20) & 0xff)
+#define UNCORE_DISCOVERY_PCI_DEVFN(data)	((data >> 12) & 0xff)
+#define UNCORE_DISCOVERY_PCI_BOX_CTRL(data)	(data & 0xfff)
+
+
 #define uncore_discovery_invalid_unit(unit)			\
 	(!unit.table1 || !unit.ctl || !unit.table3 ||	\
 	 unit.table1 == -1ULL || unit.ctl == -1ULL ||	\
@@ -121,3 +127,4 @@ struct intel_uncore_discovery_type {
 bool intel_uncore_has_discovery_tables(void);
 void intel_uncore_clear_discovery_tables(void);
 void intel_uncore_generic_uncore_cpu_init(void);
+int intel_uncore_generic_uncore_pci_init(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [PATCH V2 5/5] perf/x86/intel/uncore: Generic support for the MMIO type of uncore blocks
  2021-03-17 17:59 [PATCH V2 0/5] Uncore PMON discovery mechanism support kan.liang
                   ` (3 preceding siblings ...)
  2021-03-17 17:59 ` [PATCH V2 4/5] perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks kan.liang
@ 2021-03-17 17:59 ` kan.liang
  2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2022-09-20 18:25 ` [PATCH V2 0/5] Uncore PMON discovery mechanism support Kin Cho
  5 siblings, 1 reply; 22+ messages in thread
From: kan.liang @ 2021-03-17 17:59 UTC (permalink / raw)
  To: peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, jolsa, eranian, namhyung, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The discovery table provides the generic uncore block information
for the MMIO type of uncore blocks, which is good enough to provide
basic uncore support.

The box control field is composed of the BAR address and box control
offset. When initializing the uncore blocks, perf should ioremap the
address from the box control field.

Implement the generic support for the MMIO type of uncore block.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore.c           |  1 +
 arch/x86/events/intel/uncore.h           |  1 +
 arch/x86/events/intel/uncore_discovery.c | 98 ++++++++++++++++++++++++++++++++
 arch/x86/events/intel/uncore_discovery.h |  1 +
 4 files changed, 101 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 3109082..35b3470 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1755,6 +1755,7 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
 	.cpu_init = intel_uncore_generic_uncore_cpu_init,
 	.pci_init = intel_uncore_generic_uncore_pci_init,
+	.mmio_init = intel_uncore_generic_uncore_mmio_init,
 };
 
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 76fc898..549cfb2 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -70,6 +70,7 @@ struct intel_uncore_type {
 	union {
 		unsigned *msr_offsets;
 		unsigned *pci_offsets;
+		unsigned *mmio_offsets;
 	};
 	unsigned *box_ids;
 	struct event_constraint unconstrainted;
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 01aa2c0..7a2329a 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -442,6 +442,90 @@ static struct intel_uncore_ops generic_uncore_pci_ops = {
 	.read_counter	= intel_generic_uncore_pci_read_counter,
 };
 
+#define UNCORE_GENERIC_MMIO_SIZE		0x4000
+
+static unsigned int generic_uncore_mmio_box_ctl(struct intel_uncore_box *box)
+{
+	struct intel_uncore_type *type = box->pmu->type;
+
+	if (!type->box_ctls || !type->box_ctls[box->dieid] || !type->mmio_offsets)
+		return 0;
+
+	return type->box_ctls[box->dieid] + type->mmio_offsets[box->pmu->pmu_idx];
+}
+
+static void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
+{
+	unsigned int box_ctl = generic_uncore_mmio_box_ctl(box);
+	struct intel_uncore_type *type = box->pmu->type;
+	resource_size_t addr;
+
+	if (!box_ctl) {
+		pr_warn("Uncore type %d box %d: Invalid box control address.\n",
+			type->type_id, type->box_ids[box->pmu->pmu_idx]);
+		return;
+	}
+
+	addr = box_ctl;
+	box->io_addr = ioremap(addr, UNCORE_GENERIC_MMIO_SIZE);
+	if (!box->io_addr) {
+		pr_warn("Uncore type %d box %d: ioremap error for 0x%llx.\n",
+			type->type_id, type->box_ids[box->pmu->pmu_idx],
+			(unsigned long long)addr);
+		return;
+	}
+
+	writel(GENERIC_PMON_BOX_CTL_INT, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
+{
+	if (!box->io_addr)
+		return;
+
+	writel(GENERIC_PMON_BOX_CTL_FRZ, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box)
+{
+	if (!box->io_addr)
+		return;
+
+	writel(0, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_enable_event(struct intel_uncore_box *box,
+					     struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!box->io_addr)
+		return;
+
+	writel(hwc->config, box->io_addr + hwc->config_base);
+}
+
+static void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
+					      struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!box->io_addr)
+		return;
+
+	writel(0, box->io_addr + hwc->config_base);
+}
+
+static struct intel_uncore_ops generic_uncore_mmio_ops = {
+	.init_box	= intel_generic_uncore_mmio_init_box,
+	.exit_box	= uncore_mmio_exit_box,
+	.disable_box	= intel_generic_uncore_mmio_disable_box,
+	.enable_box	= intel_generic_uncore_mmio_enable_box,
+	.disable_event	= intel_generic_uncore_mmio_disable_event,
+	.enable_event	= intel_generic_uncore_mmio_enable_event,
+	.read_counter	= uncore_mmio_read_counter,
+};
+
 static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 				      struct intel_uncore_type *uncore,
 				      struct intel_uncore_discovery_type *type)
@@ -468,6 +552,15 @@ static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 		uncore->box_ctls = type->box_ctrl_die;
 		uncore->pci_offsets = type->box_offset;
 		break;
+	case UNCORE_ACCESS_MMIO:
+		uncore->ops = &generic_uncore_mmio_ops;
+		uncore->perf_ctr = (unsigned int)type->ctr_offset;
+		uncore->event_ctl = (unsigned int)type->ctl_offset;
+		uncore->box_ctl = (unsigned int)type->box_ctrl;
+		uncore->box_ctls = type->box_ctrl_die;
+		uncore->mmio_offsets = type->box_offset;
+		uncore->mmio_map_size = UNCORE_GENERIC_MMIO_SIZE;
+		break;
 	default:
 		return false;
 	}
@@ -522,3 +615,8 @@ int intel_uncore_generic_uncore_pci_init(void)
 
 	return 0;
 }
+
+void intel_uncore_generic_uncore_mmio_init(void)
+{
+	uncore_mmio_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MMIO);
+}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 1639ff7..1d65293 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -128,3 +128,4 @@ bool intel_uncore_has_discovery_tables(void);
 void intel_uncore_clear_discovery_tables(void);
 void intel_uncore_generic_uncore_cpu_init(void);
 int intel_uncore_generic_uncore_pci_init(void);
+void intel_uncore_generic_uncore_mmio_init(void);
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2021-03-17 17:59 ` [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables kan.liang
@ 2021-03-19  1:10   ` Namhyung Kim
  2021-03-19 20:28     ` Liang, Kan
  2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
  2022-07-22 12:55   ` [PATCH V2 1/5] " Lucas De Marchi
  2 siblings, 1 reply; 22+ messages in thread
From: Namhyung Kim @ 2021-03-19  1:10 UTC (permalink / raw)
  To: Kan Liang
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel, Alexander Shishkin, Jiri Olsa, Stephane Eranian,
	Andi Kleen

Hi Kan,

On Thu, Mar 18, 2021 at 3:05 AM <kan.liang@linux.intel.com> wrote:
>
> From: Kan Liang <kan.liang@linux.intel.com>
>
> A self-describing mechanism for the uncore PerfMon hardware has been
> introduced with the latest Intel platforms. By reading through an MMIO
> page worth of information, perf can 'discover' all the standard uncore
> PerfMon registers in a machine.
>
> The discovery mechanism relies on BIOS's support. With a proper BIOS,
> a PCI device with the unique capability ID 0x23 can be found on each
> die. Perf can retrieve the information of all available uncore PerfMons
> from the device via MMIO. The information is composed of one global
> discovery table and several unit discovery tables.
> - The global discovery table includes global uncore information of the
>   die, e.g., the address of the global control register, the offset of
>   the global status register, the number of uncore units, the offset of
>   unit discovery tables, etc.
> - The unit discovery table includes generic uncore unit information,
>   e.g., the access type, the counter width, the address of counters,
>   the address of the counter control, the unit ID, the unit type, etc.
>   The unit is also called "box" in the code.
> Perf can provide basic uncore support based on this information
> with the following patches.
>
> To locate the PCI device with the discovery tables, check the generic
> PCI ID first. If it doesn't match, go through the entire PCI device tree
> and locate the device with the unique capability ID.
>
> The uncore information is similar among dies. To save parsing time and
> space, only completely parse and store the discovery tables on the first
> die and the first box of each die. The parsed information is stored in
> an
> RB tree structure, intel_uncore_discovery_type. The size of the stored
> discovery tables varies among platforms. It's around 4KB for a Sapphire
> Rapids server.
>
> If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
> will exit with -ENODEV. There is nothing changed.
>
> Add a module parameter to disable the discovery feature. If a BIOS gets
> the discovery tables wrong, users can have an option to disable the
> feature. For the current patchset, the uncore driver will exit with
> -ENODEV. In the future, it may fall back to the hardcode uncore driver
> on a known platform.
>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> ---
>  arch/x86/events/intel/Makefile           |   2 +-
>  arch/x86/events/intel/uncore.c           |  31 ++-
>  arch/x86/events/intel/uncore_discovery.c | 318 +++++++++++++++++++++++++++++++
>  arch/x86/events/intel/uncore_discovery.h | 105 ++++++++++
>  4 files changed, 448 insertions(+), 8 deletions(-)
>  create mode 100644 arch/x86/events/intel/uncore_discovery.c
>  create mode 100644 arch/x86/events/intel/uncore_discovery.h
>
> diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
> index e67a588..10bde6c 100644
> --- a/arch/x86/events/intel/Makefile
> +++ b/arch/x86/events/intel/Makefile
> @@ -3,6 +3,6 @@ obj-$(CONFIG_CPU_SUP_INTEL)             += core.o bts.o
>  obj-$(CONFIG_CPU_SUP_INTEL)            += ds.o knc.o
>  obj-$(CONFIG_CPU_SUP_INTEL)            += lbr.o p4.o p6.o pt.o
>  obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel-uncore.o
> -intel-uncore-objs                      := uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o
> +intel-uncore-objs                      := uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o uncore_discovery.o
>  obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE) += intel-cstate.o
>  intel-cstate-objs                      := cstate.o
> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
> index 33c8180..d111370 100644
> --- a/arch/x86/events/intel/uncore.c
> +++ b/arch/x86/events/intel/uncore.c
> @@ -4,7 +4,12 @@
>  #include <asm/cpu_device_id.h>
>  #include <asm/intel-family.h>
>  #include "uncore.h"
> +#include "uncore_discovery.h"
>
> +static bool uncore_no_discover;
> +module_param(uncore_no_discover, bool, 0);

Wouldn't it be better to use a positive form like 'uncore_discover = true'?
To disable, the module param can be set to 'uncore_discover = false'.

> +MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon discovery mechanism "
> +                                    "(default: enable the discovery mechanism).");
>  static struct intel_uncore_type *empty_uncore[] = { NULL, };
>  struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
>  struct intel_uncore_type **uncore_pci_uncores = empty_uncore;

[SNIP]
> +enum uncore_access_type {
> +       UNCORE_ACCESS_MSR       = 0,
> +       UNCORE_ACCESS_MMIO,
> +       UNCORE_ACCESS_PCI,
> +
> +       UNCORE_ACCESS_MAX,
> +};
> +
> +struct uncore_global_discovery {
> +       union {
> +               u64     table1;
> +               struct {
> +                       u64     type : 8,
> +                               stride : 8,
> +                               max_units : 10,
> +                               __reserved_1 : 36,
> +                               access_type : 2;
> +               };
> +       };
> +
> +       u64     ctl;            /* Global Control Address */
> +
> +       union {
> +               u64     table3;
> +               struct {
> +                       u64     status_offset : 8,
> +                               num_status : 16,
> +                               __reserved_2 : 40;
> +               };
> +       };
> +};
> +
> +struct uncore_unit_discovery {
> +       union {
> +               u64     table1;
> +               struct {
> +                       u64     num_regs : 8,
> +                               ctl_offset : 8,
> +                               bit_width : 8,
> +                               ctr_offset : 8,
> +                               status_offset : 8,
> +                               __reserved_1 : 22,
> +                               access_type : 2;
> +                       };
> +               };
> +
> +       u64     ctl;            /* Unit Control Address */
> +
> +       union {
> +               u64     table3;
> +               struct {
> +                       u64     box_type : 16,
> +                               box_id : 16,
> +                               __reserved_2 : 32;
> +               };
> +       };
> +};
> +
> +struct intel_uncore_discovery_type {
> +       struct rb_node  node;
> +       enum uncore_access_type access_type;
> +       u64             box_ctrl;       /* Unit ctrl addr of the first box */
> +       u64             *box_ctrl_die;  /* Unit ctrl addr of the first box of each die */
> +       u16             type;           /* Type ID of the uncore block */
> +       u8              num_counters;
> +       u8              counter_width;
> +       u8              ctl_offset;     /* Counter Control 0 offset */
> +       u8              ctr_offset;     /* Counter 0 offset */

I find it confusing and easy to miss - ctl and ctr.  Some places you used
ctrl or counter.  Why not be consistent?  :)

Thanks,
Namhyung


> +       u16             num_boxes;      /* number of boxes for the uncore block */
> +       unsigned int    *ids;           /* Box IDs */
> +       unsigned int    *box_offset;    /* Box offset */
> +};
> +
> +bool intel_uncore_has_discovery_tables(void);
> +void intel_uncore_clear_discovery_tables(void);
> --
> 2.7.4
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2021-03-19  1:10   ` Namhyung Kim
@ 2021-03-19 20:28     ` Liang, Kan
  0 siblings, 0 replies; 22+ messages in thread
From: Liang, Kan @ 2021-03-19 20:28 UTC (permalink / raw)
  To: Namhyung Kim
  Cc: Peter Zijlstra, Ingo Molnar, Arnaldo Carvalho de Melo,
	linux-kernel, Alexander Shishkin, Jiri Olsa, Stephane Eranian,
	Andi Kleen



On 3/18/2021 9:10 PM, Namhyung Kim wrote:
> Hi Kan,
> 
> On Thu, Mar 18, 2021 at 3:05 AM <kan.liang@linux.intel.com> wrote:
>>
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> A self-describing mechanism for the uncore PerfMon hardware has been
>> introduced with the latest Intel platforms. By reading through an MMIO
>> page worth of information, perf can 'discover' all the standard uncore
>> PerfMon registers in a machine.
>>
>> The discovery mechanism relies on BIOS's support. With a proper BIOS,
>> a PCI device with the unique capability ID 0x23 can be found on each
>> die. Perf can retrieve the information of all available uncore PerfMons
>> from the device via MMIO. The information is composed of one global
>> discovery table and several unit discovery tables.
>> - The global discovery table includes global uncore information of the
>>    die, e.g., the address of the global control register, the offset of
>>    the global status register, the number of uncore units, the offset of
>>    unit discovery tables, etc.
>> - The unit discovery table includes generic uncore unit information,
>>    e.g., the access type, the counter width, the address of counters,
>>    the address of the counter control, the unit ID, the unit type, etc.
>>    The unit is also called "box" in the code.
>> Perf can provide basic uncore support based on this information
>> with the following patches.
>>
>> To locate the PCI device with the discovery tables, check the generic
>> PCI ID first. If it doesn't match, go through the entire PCI device tree
>> and locate the device with the unique capability ID.
>>
>> The uncore information is similar among dies. To save parsing time and
>> space, only completely parse and store the discovery tables on the first
>> die and the first box of each die. The parsed information is stored in
>> an
>> RB tree structure, intel_uncore_discovery_type. The size of the stored
>> discovery tables varies among platforms. It's around 4KB for a Sapphire
>> Rapids server.
>>
>> If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
>> will exit with -ENODEV. There is nothing changed.
>>
>> Add a module parameter to disable the discovery feature. If a BIOS gets
>> the discovery tables wrong, users can have an option to disable the
>> feature. For the current patchset, the uncore driver will exit with
>> -ENODEV. In the future, it may fall back to the hardcode uncore driver
>> on a known platform.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>> ---
>>   arch/x86/events/intel/Makefile           |   2 +-
>>   arch/x86/events/intel/uncore.c           |  31 ++-
>>   arch/x86/events/intel/uncore_discovery.c | 318 +++++++++++++++++++++++++++++++
>>   arch/x86/events/intel/uncore_discovery.h | 105 ++++++++++
>>   4 files changed, 448 insertions(+), 8 deletions(-)
>>   create mode 100644 arch/x86/events/intel/uncore_discovery.c
>>   create mode 100644 arch/x86/events/intel/uncore_discovery.h
>>
>> diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
>> index e67a588..10bde6c 100644
>> --- a/arch/x86/events/intel/Makefile
>> +++ b/arch/x86/events/intel/Makefile
>> @@ -3,6 +3,6 @@ obj-$(CONFIG_CPU_SUP_INTEL)             += core.o bts.o
>>   obj-$(CONFIG_CPU_SUP_INTEL)            += ds.o knc.o
>>   obj-$(CONFIG_CPU_SUP_INTEL)            += lbr.o p4.o p6.o pt.o
>>   obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE) += intel-uncore.o
>> -intel-uncore-objs                      := uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o
>> +intel-uncore-objs                      := uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o uncore_discovery.o
>>   obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE) += intel-cstate.o
>>   intel-cstate-objs                      := cstate.o
>> diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>> index 33c8180..d111370 100644
>> --- a/arch/x86/events/intel/uncore.c
>> +++ b/arch/x86/events/intel/uncore.c
>> @@ -4,7 +4,12 @@
>>   #include <asm/cpu_device_id.h>
>>   #include <asm/intel-family.h>
>>   #include "uncore.h"
>> +#include "uncore_discovery.h"
>>
>> +static bool uncore_no_discover;
>> +module_param(uncore_no_discover, bool, 0);
> 
> Wouldn't it be better to use a positive form like 'uncore_discover = true'?
> To disable, the module param can be set to 'uncore_discover = false'.
> 

I'd like the feature is enabled by default. The default value of a 
static is 0. So I use the current name. It's just a personal preference.

>> +MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon discovery mechanism "
>> +                                    "(default: enable the discovery mechanism).");
>>   static struct intel_uncore_type *empty_uncore[] = { NULL, };
>>   struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
>>   struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
> 
> [SNIP]
>> +enum uncore_access_type {
>> +       UNCORE_ACCESS_MSR       = 0,
>> +       UNCORE_ACCESS_MMIO,
>> +       UNCORE_ACCESS_PCI,
>> +
>> +       UNCORE_ACCESS_MAX,
>> +};
>> +
>> +struct uncore_global_discovery {
>> +       union {
>> +               u64     table1;
>> +               struct {
>> +                       u64     type : 8,
>> +                               stride : 8,
>> +                               max_units : 10,
>> +                               __reserved_1 : 36,
>> +                               access_type : 2;
>> +               };
>> +       };
>> +
>> +       u64     ctl;            /* Global Control Address */
>> +
>> +       union {
>> +               u64     table3;
>> +               struct {
>> +                       u64     status_offset : 8,
>> +                               num_status : 16,
>> +                               __reserved_2 : 40;
>> +               };
>> +       };
>> +};
>> +
>> +struct uncore_unit_discovery {
>> +       union {
>> +               u64     table1;
>> +               struct {
>> +                       u64     num_regs : 8,
>> +                               ctl_offset : 8,
>> +                               bit_width : 8,
>> +                               ctr_offset : 8,
>> +                               status_offset : 8,
>> +                               __reserved_1 : 22,
>> +                               access_type : 2;
>> +                       };
>> +               };
>> +
>> +       u64     ctl;            /* Unit Control Address */
>> +
>> +       union {
>> +               u64     table3;
>> +               struct {
>> +                       u64     box_type : 16,
>> +                               box_id : 16,
>> +                               __reserved_2 : 32;
>> +               };
>> +       };
>> +};
>> +
>> +struct intel_uncore_discovery_type {
>> +       struct rb_node  node;
>> +       enum uncore_access_type access_type;
>> +       u64             box_ctrl;       /* Unit ctrl addr of the first box */
>> +       u64             *box_ctrl_die;  /* Unit ctrl addr of the first box of each die */
>> +       u16             type;           /* Type ID of the uncore block */
>> +       u8              num_counters;
>> +       u8              counter_width;
>> +       u8              ctl_offset;     /* Counter Control 0 offset */
>> +       u8              ctr_offset;     /* Counter 0 offset */
> 
> I find it confusing and easy to miss - ctl and ctr.  Some places you used
> ctrl or counter.  Why not be consistent?  :)
>

The ctl and ctr are consistent with the variable name in the struct 
intel_uncore_type.

The counter or counter control are only in the comments.

I guess the naming should be OK. :)

Thanks,
Kan

> Thanks,
> Namhyung
> 
> 
>> +       u16             num_boxes;      /* number of boxes for the uncore block */
>> +       unsigned int    *ids;           /* Box IDs */
>> +       unsigned int    *box_offset;    /* Box offset */
>> +};
>> +
>> +bool intel_uncore_has_discovery_tables(void);
>> +void intel_uncore_clear_discovery_tables(void);
>> --
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* [tip: perf/core] perf/x86/intel/uncore: Generic support for the MMIO type of uncore blocks
  2021-03-17 17:59 ` [PATCH V2 5/5] perf/x86/intel/uncore: Generic support for the MMIO " kan.liang
@ 2021-04-02  8:12   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Kan Liang @ 2021-04-02  8:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     c4c55e362a521d763356b9e02bc9a4348c71a471
Gitweb:        https://git.kernel.org/tip/c4c55e362a521d763356b9e02bc9a4348c71a471
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Wed, 17 Mar 2021 10:59:37 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 02 Apr 2021 10:04:55 +02:00

perf/x86/intel/uncore: Generic support for the MMIO type of uncore blocks

The discovery table provides the generic uncore block information
for the MMIO type of uncore blocks, which is good enough to provide
basic uncore support.

The box control field is composed of the BAR address and box control
offset. When initializing the uncore blocks, perf should ioremap the
address from the box control field.

Implement the generic support for the MMIO type of uncore block.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-6-git-send-email-kan.liang@linux.intel.com
---
 arch/x86/events/intel/uncore.c           |  1 +-
 arch/x86/events/intel/uncore.h           |  1 +-
 arch/x86/events/intel/uncore_discovery.c | 98 +++++++++++++++++++++++-
 arch/x86/events/intel/uncore_discovery.h |  1 +-
 4 files changed, 101 insertions(+)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 3109082..35b3470 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1755,6 +1755,7 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
 	.cpu_init = intel_uncore_generic_uncore_cpu_init,
 	.pci_init = intel_uncore_generic_uncore_pci_init,
+	.mmio_init = intel_uncore_generic_uncore_mmio_init,
 };
 
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 76fc898..549cfb2 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -70,6 +70,7 @@ struct intel_uncore_type {
 	union {
 		unsigned *msr_offsets;
 		unsigned *pci_offsets;
+		unsigned *mmio_offsets;
 	};
 	unsigned *box_ids;
 	struct event_constraint unconstrainted;
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 784d7b4..aba9bff 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -442,6 +442,90 @@ static struct intel_uncore_ops generic_uncore_pci_ops = {
 	.read_counter	= intel_generic_uncore_pci_read_counter,
 };
 
+#define UNCORE_GENERIC_MMIO_SIZE		0x4000
+
+static unsigned int generic_uncore_mmio_box_ctl(struct intel_uncore_box *box)
+{
+	struct intel_uncore_type *type = box->pmu->type;
+
+	if (!type->box_ctls || !type->box_ctls[box->dieid] || !type->mmio_offsets)
+		return 0;
+
+	return type->box_ctls[box->dieid] + type->mmio_offsets[box->pmu->pmu_idx];
+}
+
+static void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
+{
+	unsigned int box_ctl = generic_uncore_mmio_box_ctl(box);
+	struct intel_uncore_type *type = box->pmu->type;
+	resource_size_t addr;
+
+	if (!box_ctl) {
+		pr_warn("Uncore type %d box %d: Invalid box control address.\n",
+			type->type_id, type->box_ids[box->pmu->pmu_idx]);
+		return;
+	}
+
+	addr = box_ctl;
+	box->io_addr = ioremap(addr, UNCORE_GENERIC_MMIO_SIZE);
+	if (!box->io_addr) {
+		pr_warn("Uncore type %d box %d: ioremap error for 0x%llx.\n",
+			type->type_id, type->box_ids[box->pmu->pmu_idx],
+			(unsigned long long)addr);
+		return;
+	}
+
+	writel(GENERIC_PMON_BOX_CTL_INT, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
+{
+	if (!box->io_addr)
+		return;
+
+	writel(GENERIC_PMON_BOX_CTL_FRZ, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box)
+{
+	if (!box->io_addr)
+		return;
+
+	writel(0, box->io_addr);
+}
+
+static void intel_generic_uncore_mmio_enable_event(struct intel_uncore_box *box,
+					     struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!box->io_addr)
+		return;
+
+	writel(hwc->config, box->io_addr + hwc->config_base);
+}
+
+static void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
+					      struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!box->io_addr)
+		return;
+
+	writel(0, box->io_addr + hwc->config_base);
+}
+
+static struct intel_uncore_ops generic_uncore_mmio_ops = {
+	.init_box	= intel_generic_uncore_mmio_init_box,
+	.exit_box	= uncore_mmio_exit_box,
+	.disable_box	= intel_generic_uncore_mmio_disable_box,
+	.enable_box	= intel_generic_uncore_mmio_enable_box,
+	.disable_event	= intel_generic_uncore_mmio_disable_event,
+	.enable_event	= intel_generic_uncore_mmio_enable_event,
+	.read_counter	= uncore_mmio_read_counter,
+};
+
 static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 				      struct intel_uncore_type *uncore,
 				      struct intel_uncore_discovery_type *type)
@@ -468,6 +552,15 @@ static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 		uncore->box_ctls = type->box_ctrl_die;
 		uncore->pci_offsets = type->box_offset;
 		break;
+	case UNCORE_ACCESS_MMIO:
+		uncore->ops = &generic_uncore_mmio_ops;
+		uncore->perf_ctr = (unsigned int)type->ctr_offset;
+		uncore->event_ctl = (unsigned int)type->ctl_offset;
+		uncore->box_ctl = (unsigned int)type->box_ctrl;
+		uncore->box_ctls = type->box_ctrl_die;
+		uncore->mmio_offsets = type->box_offset;
+		uncore->mmio_map_size = UNCORE_GENERIC_MMIO_SIZE;
+		break;
 	default:
 		return false;
 	}
@@ -522,3 +615,8 @@ int intel_uncore_generic_uncore_pci_init(void)
 
 	return 0;
 }
+
+void intel_uncore_generic_uncore_mmio_init(void)
+{
+	uncore_mmio_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MMIO);
+}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 1639ff7..1d65293 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -128,3 +128,4 @@ bool intel_uncore_has_discovery_tables(void);
 void intel_uncore_clear_discovery_tables(void);
 void intel_uncore_generic_uncore_cpu_init(void);
 int intel_uncore_generic_uncore_pci_init(void);
+void intel_uncore_generic_uncore_mmio_init(void);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: perf/core] perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier
  2021-03-17 17:59 ` [PATCH V2 3/5] perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier kan.liang
@ 2021-04-02  8:12   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Kan Liang @ 2021-04-02  8:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     6477dc3934775f82a571fac469fd8c348e611095
Gitweb:        https://git.kernel.org/tip/6477dc3934775f82a571fac469fd8c348e611095
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Wed, 17 Mar 2021 10:59:35 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 02 Apr 2021 10:04:54 +02:00

perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier

Perf will use a similar method to the PCI sub driver to register
the PMUs for the PCI type of uncore blocks. The method requires a BUS
notifier to support hotplug. The current BUS notifier cannot be reused,
because it searches a const id_table for the corresponding registered
PMU. The PCI type of uncore blocks in the discovery tables doesn't
provide an id_table.

Factor out uncore_bus_notify() and add the pointer of an id_table as a
parameter. The uncore_bus_notify() will be reused in the following
patch.

The current BUS notifier is only used by the PCI sub driver. Its name is
too generic. Rename it to uncore_pci_sub_notifier, which is specific for
the PCI sub driver.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-4-git-send-email-kan.liang@linux.intel.com
---
 arch/x86/events/intel/uncore.c | 20 ++++++++++++++------
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index dabc01f..391fa7c 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1203,7 +1203,8 @@ static void uncore_pci_remove(struct pci_dev *pdev)
 }
 
 static int uncore_bus_notify(struct notifier_block *nb,
-			     unsigned long action, void *data)
+			     unsigned long action, void *data,
+			     const struct pci_device_id *ids)
 {
 	struct device *dev = data;
 	struct pci_dev *pdev = to_pci_dev(dev);
@@ -1214,7 +1215,7 @@ static int uncore_bus_notify(struct notifier_block *nb,
 	if (action != BUS_NOTIFY_DEL_DEVICE)
 		return NOTIFY_DONE;
 
-	pmu = uncore_pci_find_dev_pmu(pdev, uncore_pci_sub_driver->id_table);
+	pmu = uncore_pci_find_dev_pmu(pdev, ids);
 	if (!pmu)
 		return NOTIFY_DONE;
 
@@ -1226,8 +1227,15 @@ static int uncore_bus_notify(struct notifier_block *nb,
 	return NOTIFY_OK;
 }
 
-static struct notifier_block uncore_notifier = {
-	.notifier_call = uncore_bus_notify,
+static int uncore_pci_sub_bus_notify(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	return uncore_bus_notify(nb, action, data,
+				 uncore_pci_sub_driver->id_table);
+}
+
+static struct notifier_block uncore_pci_sub_notifier = {
+	.notifier_call = uncore_pci_sub_bus_notify,
 };
 
 static void uncore_pci_sub_driver_init(void)
@@ -1268,7 +1276,7 @@ static void uncore_pci_sub_driver_init(void)
 		ids++;
 	}
 
-	if (notify && bus_register_notifier(&pci_bus_type, &uncore_notifier))
+	if (notify && bus_register_notifier(&pci_bus_type, &uncore_pci_sub_notifier))
 		notify = false;
 
 	if (!notify)
@@ -1319,7 +1327,7 @@ static void uncore_pci_exit(void)
 	if (pcidrv_registered) {
 		pcidrv_registered = false;
 		if (uncore_pci_sub_driver)
-			bus_unregister_notifier(&pci_bus_type, &uncore_notifier);
+			bus_unregister_notifier(&pci_bus_type, &uncore_pci_sub_notifier);
 		pci_unregister_driver(uncore_pci_driver);
 		uncore_types_exit(uncore_pci_uncores);
 		kfree(uncore_extra_pci_dev);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: perf/core] perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks
  2021-03-17 17:59 ` [PATCH V2 4/5] perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks kan.liang
@ 2021-04-02  8:12   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Kan Liang @ 2021-04-02  8:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     42839ef4a20a4bda415974ff0e7d85ff540fffa4
Gitweb:        https://git.kernel.org/tip/42839ef4a20a4bda415974ff0e7d85ff540fffa4
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Wed, 17 Mar 2021 10:59:36 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 02 Apr 2021 10:04:55 +02:00

perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks

The discovery table provides the generic uncore block information
for the PCI type of uncore blocks, which is good enough to provide
basic uncore support.

The PCI BUS and DEVFN information can be retrieved from the box control
field. Introduce the uncore_pci_pmus_register() to register all the
PCICFG type of uncore blocks. The old PCI probe/remove way is dropped.

The PCI BUS and DEVFN information are different among dies. Add box_ctls
to store the box control field of each die.

Add a new BUS notifier for the PCI type of uncore block to support the
hotplug. If the device is "hot remove", the corresponding registered PMU
has to be unregistered. Perf cannot locate the PMU by searching a const
pci_device_id table, because the discovery tables don't provide such
information. Introduce uncore_pci_find_dev_pmu_from_types() to search
the whole uncore_pci_uncores for the PMU.

Implement generic support for the PCI type of uncore block.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-5-git-send-email-kan.liang@linux.intel.com
---
 arch/x86/events/intel/uncore.c           | 91 +++++++++++++++++++++--
 arch/x86/events/intel/uncore.h           |  6 +-
 arch/x86/events/intel/uncore_discovery.c | 80 ++++++++++++++++++++-
 arch/x86/events/intel/uncore_discovery.h |  7 ++-
 4 files changed, 177 insertions(+), 7 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 391fa7c..3109082 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1032,10 +1032,37 @@ static int uncore_pci_get_dev_die_info(struct pci_dev *pdev, int *die)
 	return 0;
 }
 
+static struct intel_uncore_pmu *
+uncore_pci_find_dev_pmu_from_types(struct pci_dev *pdev)
+{
+	struct intel_uncore_type **types = uncore_pci_uncores;
+	struct intel_uncore_type *type;
+	u64 box_ctl;
+	int i, die;
+
+	for (; *types; types++) {
+		type = *types;
+		for (die = 0; die < __uncore_max_dies; die++) {
+			for (i = 0; i < type->num_boxes; i++) {
+				if (!type->box_ctls[die])
+					continue;
+				box_ctl = type->box_ctls[die] + type->pci_offsets[i];
+				if (pdev->devfn == UNCORE_DISCOVERY_PCI_DEVFN(box_ctl) &&
+				    pdev->bus->number == UNCORE_DISCOVERY_PCI_BUS(box_ctl) &&
+				    pci_domain_nr(pdev->bus) == UNCORE_DISCOVERY_PCI_DOMAIN(box_ctl))
+					return &type->pmus[i];
+			}
+		}
+	}
+
+	return NULL;
+}
+
 /*
  * Find the PMU of a PCI device.
  * @pdev: The PCI device.
  * @ids: The ID table of the available PCI devices with a PMU.
+ *       If NULL, search the whole uncore_pci_uncores.
  */
 static struct intel_uncore_pmu *
 uncore_pci_find_dev_pmu(struct pci_dev *pdev, const struct pci_device_id *ids)
@@ -1045,6 +1072,9 @@ uncore_pci_find_dev_pmu(struct pci_dev *pdev, const struct pci_device_id *ids)
 	kernel_ulong_t data;
 	unsigned int devfn;
 
+	if (!ids)
+		return uncore_pci_find_dev_pmu_from_types(pdev);
+
 	while (ids && ids->vendor) {
 		if ((ids->vendor == pdev->vendor) &&
 		    (ids->device == pdev->device)) {
@@ -1283,6 +1313,48 @@ static void uncore_pci_sub_driver_init(void)
 		uncore_pci_sub_driver = NULL;
 }
 
+static int uncore_pci_bus_notify(struct notifier_block *nb,
+				     unsigned long action, void *data)
+{
+	return uncore_bus_notify(nb, action, data, NULL);
+}
+
+static struct notifier_block uncore_pci_notifier = {
+	.notifier_call = uncore_pci_bus_notify,
+};
+
+
+static void uncore_pci_pmus_register(void)
+{
+	struct intel_uncore_type **types = uncore_pci_uncores;
+	struct intel_uncore_type *type;
+	struct intel_uncore_pmu *pmu;
+	struct pci_dev *pdev;
+	u64 box_ctl;
+	int i, die;
+
+	for (; *types; types++) {
+		type = *types;
+		for (die = 0; die < __uncore_max_dies; die++) {
+			for (i = 0; i < type->num_boxes; i++) {
+				if (!type->box_ctls[die])
+					continue;
+				box_ctl = type->box_ctls[die] + type->pci_offsets[i];
+				pdev = pci_get_domain_bus_and_slot(UNCORE_DISCOVERY_PCI_DOMAIN(box_ctl),
+								   UNCORE_DISCOVERY_PCI_BUS(box_ctl),
+								   UNCORE_DISCOVERY_PCI_DEVFN(box_ctl));
+				if (!pdev)
+					continue;
+				pmu = &type->pmus[i];
+
+				uncore_pci_pmu_register(pdev, type, pmu, die);
+			}
+		}
+	}
+
+	bus_register_notifier(&pci_bus_type, &uncore_pci_notifier);
+}
+
 static int __init uncore_pci_init(void)
 {
 	size_t size;
@@ -1299,12 +1371,15 @@ static int __init uncore_pci_init(void)
 	if (ret)
 		goto errtype;
 
-	uncore_pci_driver->probe = uncore_pci_probe;
-	uncore_pci_driver->remove = uncore_pci_remove;
+	if (uncore_pci_driver) {
+		uncore_pci_driver->probe = uncore_pci_probe;
+		uncore_pci_driver->remove = uncore_pci_remove;
 
-	ret = pci_register_driver(uncore_pci_driver);
-	if (ret)
-		goto errtype;
+		ret = pci_register_driver(uncore_pci_driver);
+		if (ret)
+			goto errtype;
+	} else
+		uncore_pci_pmus_register();
 
 	if (uncore_pci_sub_driver)
 		uncore_pci_sub_driver_init();
@@ -1328,7 +1403,10 @@ static void uncore_pci_exit(void)
 		pcidrv_registered = false;
 		if (uncore_pci_sub_driver)
 			bus_unregister_notifier(&pci_bus_type, &uncore_pci_sub_notifier);
-		pci_unregister_driver(uncore_pci_driver);
+		if (uncore_pci_driver)
+			pci_unregister_driver(uncore_pci_driver);
+		else
+			bus_unregister_notifier(&pci_bus_type, &uncore_pci_notifier);
 		uncore_types_exit(uncore_pci_uncores);
 		kfree(uncore_extra_pci_dev);
 		uncore_free_pcibus_map();
@@ -1676,6 +1754,7 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 
 static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
 	.cpu_init = intel_uncore_generic_uncore_cpu_init,
+	.pci_init = intel_uncore_generic_uncore_pci_init,
 };
 
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 05c8e06..76fc898 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -58,6 +58,7 @@ struct intel_uncore_type {
 	unsigned fixed_ctr;
 	unsigned fixed_ctl;
 	unsigned box_ctl;
+	u64 *box_ctls;	/* Unit ctrl addr of the first box of each die */
 	union {
 		unsigned msr_offset;
 		unsigned mmio_offset;
@@ -66,7 +67,10 @@ struct intel_uncore_type {
 	unsigned num_shared_regs:8;
 	unsigned single_fixed:1;
 	unsigned pair_ctr_ctl:1;
-	unsigned *msr_offsets;
+	union {
+		unsigned *msr_offsets;
+		unsigned *pci_offsets;
+	};
 	unsigned *box_ids;
 	struct event_constraint unconstrainted;
 	struct event_constraint *constraints;
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index fefb3e2..784d7b4 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -377,6 +377,71 @@ static struct intel_uncore_ops generic_uncore_msr_ops = {
 	.read_counter		= uncore_msr_read_counter,
 };
 
+static void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+
+	__set_bit(UNCORE_BOX_FLAG_CTL_OFFS8, &box->flags);
+	pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
+}
+
+static void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+
+	pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_FRZ);
+}
+
+static void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	int box_ctl = uncore_pci_box_ctl(box);
+
+	pci_write_config_dword(pdev, box_ctl, 0);
+}
+
+static void intel_generic_uncore_pci_enable_event(struct intel_uncore_box *box,
+					    struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base, hwc->config);
+}
+
+static void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
+					     struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base, 0);
+}
+
+static u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
+					   struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+	u64 count = 0;
+
+	pci_read_config_dword(pdev, hwc->event_base, (u32 *)&count);
+	pci_read_config_dword(pdev, hwc->event_base + 4, (u32 *)&count + 1);
+
+	return count;
+}
+
+static struct intel_uncore_ops generic_uncore_pci_ops = {
+	.init_box	= intel_generic_uncore_pci_init_box,
+	.disable_box	= intel_generic_uncore_pci_disable_box,
+	.enable_box	= intel_generic_uncore_pci_enable_box,
+	.disable_event	= intel_generic_uncore_pci_disable_event,
+	.enable_event	= intel_generic_uncore_pci_enable_event,
+	.read_counter	= intel_generic_uncore_pci_read_counter,
+};
+
 static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 				      struct intel_uncore_type *uncore,
 				      struct intel_uncore_discovery_type *type)
@@ -395,6 +460,14 @@ static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 		uncore->box_ctl = (unsigned int)type->box_ctrl;
 		uncore->msr_offsets = type->box_offset;
 		break;
+	case UNCORE_ACCESS_PCI:
+		uncore->ops = &generic_uncore_pci_ops;
+		uncore->perf_ctr = (unsigned int)UNCORE_DISCOVERY_PCI_BOX_CTRL(type->box_ctrl) + type->ctr_offset;
+		uncore->event_ctl = (unsigned int)UNCORE_DISCOVERY_PCI_BOX_CTRL(type->box_ctrl) + type->ctl_offset;
+		uncore->box_ctl = (unsigned int)UNCORE_DISCOVERY_PCI_BOX_CTRL(type->box_ctrl);
+		uncore->box_ctls = type->box_ctrl_die;
+		uncore->pci_offsets = type->box_offset;
+		break;
 	default:
 		return false;
 	}
@@ -442,3 +515,10 @@ void intel_uncore_generic_uncore_cpu_init(void)
 {
 	uncore_msr_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MSR);
 }
+
+int intel_uncore_generic_uncore_pci_init(void)
+{
+	uncore_pci_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_PCI);
+
+	return 0;
+}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 87078ba..1639ff7 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -23,6 +23,12 @@
 /* Global discovery table size */
 #define UNCORE_DISCOVERY_GLOBAL_MAP_SIZE	0x20
 
+#define UNCORE_DISCOVERY_PCI_DOMAIN(data)	((data >> 28) & 0x7)
+#define UNCORE_DISCOVERY_PCI_BUS(data)		((data >> 20) & 0xff)
+#define UNCORE_DISCOVERY_PCI_DEVFN(data)	((data >> 12) & 0xff)
+#define UNCORE_DISCOVERY_PCI_BOX_CTRL(data)	(data & 0xfff)
+
+
 #define uncore_discovery_invalid_unit(unit)			\
 	(!unit.table1 || !unit.ctl || !unit.table3 ||	\
 	 unit.table1 == -1ULL || unit.ctl == -1ULL ||	\
@@ -121,3 +127,4 @@ struct intel_uncore_discovery_type {
 bool intel_uncore_has_discovery_tables(void);
 void intel_uncore_clear_discovery_tables(void);
 void intel_uncore_generic_uncore_cpu_init(void);
+int intel_uncore_generic_uncore_pci_init(void);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: perf/core] perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks
  2021-03-17 17:59 ` [PATCH V2 2/5] perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks kan.liang
@ 2021-04-02  8:12   ` tip-bot2 for Kan Liang
  0 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Kan Liang @ 2021-04-02  8:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     d6c754130435ab786711bed75d04a2388a6b4da8
Gitweb:        https://git.kernel.org/tip/d6c754130435ab786711bed75d04a2388a6b4da8
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Wed, 17 Mar 2021 10:59:34 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 02 Apr 2021 10:04:54 +02:00

perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks

The discovery table provides the generic uncore block information for
the MSR type of uncore blocks, e.g., the counter width, the number of
counters, the location of control/counter registers, which is good
enough to provide basic uncore support. It can be used as a fallback
solution when the kernel doesn't support a platform.

The name of the uncore box cannot be retrieved from the discovery table.
uncore_type_&typeID_&boxID will be used as its name. Save the type ID
and the box ID information in the struct intel_uncore_type.
Factor out uncore_get_pmu_name() to handle different naming methods.

Implement generic support for the MSR type of uncore block.

Some advanced features, such as filters and constraints, cannot be
retrieved from discovery tables. Features that rely on that
information are not be supported here.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-3-git-send-email-kan.liang@linux.intel.com
---
 arch/x86/events/intel/uncore.c           |  45 ++++++--
 arch/x86/events/intel/uncore.h           |   3 +-
 arch/x86/events/intel/uncore_discovery.c | 126 ++++++++++++++++++++++-
 arch/x86/events/intel/uncore_discovery.h |  18 +++-
 4 files changed, 182 insertions(+), 10 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index d111370..dabc01f 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -10,7 +10,7 @@ static bool uncore_no_discover;
 module_param(uncore_no_discover, bool, 0);
 MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon discovery mechanism "
 				     "(default: enable the discovery mechanism).");
-static struct intel_uncore_type *empty_uncore[] = { NULL, };
+struct intel_uncore_type *empty_uncore[] = { NULL, };
 struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
 struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
 struct intel_uncore_type **uncore_mmio_uncores = empty_uncore;
@@ -834,6 +834,34 @@ static const struct attribute_group uncore_pmu_attr_group = {
 	.attrs = uncore_pmu_attrs,
 };
 
+static void uncore_get_pmu_name(struct intel_uncore_pmu *pmu)
+{
+	struct intel_uncore_type *type = pmu->type;
+
+	/*
+	 * No uncore block name in discovery table.
+	 * Use uncore_type_&typeid_&boxid as name.
+	 */
+	if (!type->name) {
+		if (type->num_boxes == 1)
+			sprintf(pmu->name, "uncore_type_%u", type->type_id);
+		else {
+			sprintf(pmu->name, "uncore_type_%u_%d",
+				type->type_id, type->box_ids[pmu->pmu_idx]);
+		}
+		return;
+	}
+
+	if (type->num_boxes == 1) {
+		if (strlen(type->name) > 0)
+			sprintf(pmu->name, "uncore_%s", type->name);
+		else
+			sprintf(pmu->name, "uncore");
+	} else
+		sprintf(pmu->name, "uncore_%s_%d", type->name, pmu->pmu_idx);
+
+}
+
 static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
 {
 	int ret;
@@ -860,15 +888,7 @@ static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
 		pmu->pmu.attr_update = pmu->type->attr_update;
 	}
 
-	if (pmu->type->num_boxes == 1) {
-		if (strlen(pmu->type->name) > 0)
-			sprintf(pmu->name, "uncore_%s", pmu->type->name);
-		else
-			sprintf(pmu->name, "uncore");
-	} else {
-		sprintf(pmu->name, "uncore_%s_%d", pmu->type->name,
-			pmu->pmu_idx);
-	}
+	uncore_get_pmu_name(pmu);
 
 	ret = perf_pmu_register(&pmu->pmu, pmu->name, -1);
 	if (!ret)
@@ -909,6 +929,10 @@ static void uncore_type_exit(struct intel_uncore_type *type)
 		kfree(type->pmus);
 		type->pmus = NULL;
 	}
+	if (type->box_ids) {
+		kfree(type->box_ids);
+		type->box_ids = NULL;
+	}
 	kfree(type->events_group);
 	type->events_group = NULL;
 }
@@ -1643,6 +1667,7 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 };
 
 static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
+	.cpu_init = intel_uncore_generic_uncore_cpu_init,
 };
 
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index a3c6e16..05c8e06 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -50,6 +50,7 @@ struct intel_uncore_type {
 	int perf_ctr_bits;
 	int fixed_ctr_bits;
 	int num_freerunning_types;
+	int type_id;
 	unsigned perf_ctr;
 	unsigned event_ctl;
 	unsigned event_mask;
@@ -66,6 +67,7 @@ struct intel_uncore_type {
 	unsigned single_fixed:1;
 	unsigned pair_ctr_ctl:1;
 	unsigned *msr_offsets;
+	unsigned *box_ids;
 	struct event_constraint unconstrainted;
 	struct event_constraint *constraints;
 	struct intel_uncore_pmu *pmus;
@@ -547,6 +549,7 @@ uncore_get_constraint(struct intel_uncore_box *box, struct perf_event *event);
 void uncore_put_constraint(struct intel_uncore_box *box, struct perf_event *event);
 u64 uncore_shared_reg_config(struct intel_uncore_box *box, int idx);
 
+extern struct intel_uncore_type *empty_uncore[];
 extern struct intel_uncore_type **uncore_msr_uncores;
 extern struct intel_uncore_type **uncore_pci_uncores;
 extern struct intel_uncore_type **uncore_mmio_uncores;
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 7519ce3..fefb3e2 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -316,3 +316,129 @@ void intel_uncore_clear_discovery_tables(void)
 		kfree(type);
 	}
 }
+
+DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
+DEFINE_UNCORE_FORMAT_ATTR(umask, umask, "config:8-15");
+DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
+DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
+DEFINE_UNCORE_FORMAT_ATTR(thresh, thresh, "config:24-31");
+
+static struct attribute *generic_uncore_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh.attr,
+	NULL,
+};
+
+static const struct attribute_group generic_uncore_format_group = {
+	.name = "format",
+	.attrs = generic_uncore_formats_attr,
+};
+
+static void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
+{
+	wrmsrl(uncore_msr_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
+}
+
+static void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box)
+{
+	wrmsrl(uncore_msr_box_ctl(box), GENERIC_PMON_BOX_CTL_FRZ);
+}
+
+static void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box)
+{
+	wrmsrl(uncore_msr_box_ctl(box), 0);
+}
+
+static void intel_generic_uncore_msr_enable_event(struct intel_uncore_box *box,
+					    struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	wrmsrl(hwc->config_base, hwc->config);
+}
+
+static void intel_generic_uncore_msr_disable_event(struct intel_uncore_box *box,
+					     struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	wrmsrl(hwc->config_base, 0);
+}
+
+static struct intel_uncore_ops generic_uncore_msr_ops = {
+	.init_box		= intel_generic_uncore_msr_init_box,
+	.disable_box		= intel_generic_uncore_msr_disable_box,
+	.enable_box		= intel_generic_uncore_msr_enable_box,
+	.disable_event		= intel_generic_uncore_msr_disable_event,
+	.enable_event		= intel_generic_uncore_msr_enable_event,
+	.read_counter		= uncore_msr_read_counter,
+};
+
+static bool uncore_update_uncore_type(enum uncore_access_type type_id,
+				      struct intel_uncore_type *uncore,
+				      struct intel_uncore_discovery_type *type)
+{
+	uncore->type_id = type->type;
+	uncore->num_boxes = type->num_boxes;
+	uncore->num_counters = type->num_counters;
+	uncore->perf_ctr_bits = type->counter_width;
+	uncore->box_ids = type->ids;
+
+	switch (type_id) {
+	case UNCORE_ACCESS_MSR:
+		uncore->ops = &generic_uncore_msr_ops;
+		uncore->perf_ctr = (unsigned int)type->box_ctrl + type->ctr_offset;
+		uncore->event_ctl = (unsigned int)type->box_ctrl + type->ctl_offset;
+		uncore->box_ctl = (unsigned int)type->box_ctrl;
+		uncore->msr_offsets = type->box_offset;
+		break;
+	default:
+		return false;
+	}
+
+	return true;
+}
+
+static struct intel_uncore_type **
+intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
+{
+	struct intel_uncore_discovery_type *type;
+	struct intel_uncore_type **uncores;
+	struct intel_uncore_type *uncore;
+	struct rb_node *node;
+	int i = 0;
+
+	uncores = kcalloc(num_discovered_types[type_id] + 1,
+			  sizeof(struct intel_uncore_type *), GFP_KERNEL);
+	if (!uncores)
+		return empty_uncore;
+
+	for (node = rb_first(&discovery_tables); node; node = rb_next(node)) {
+		type = rb_entry(node, struct intel_uncore_discovery_type, node);
+		if (type->access_type != type_id)
+			continue;
+
+		uncore = kzalloc(sizeof(struct intel_uncore_type), GFP_KERNEL);
+		if (!uncore)
+			break;
+
+		uncore->event_mask = GENERIC_PMON_RAW_EVENT_MASK;
+		uncore->format_group = &generic_uncore_format_group;
+
+		if (!uncore_update_uncore_type(type_id, uncore, type)) {
+			kfree(uncore);
+			continue;
+		}
+		uncores[i++] = uncore;
+	}
+
+	return uncores;
+}
+
+void intel_uncore_generic_uncore_cpu_init(void)
+{
+	uncore_msr_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MSR);
+}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 95afa39..87078ba 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -28,6 +28,23 @@
 	 unit.table1 == -1ULL || unit.ctl == -1ULL ||	\
 	 unit.table3 == -1ULL)
 
+#define GENERIC_PMON_CTL_EV_SEL_MASK	0x000000ff
+#define GENERIC_PMON_CTL_UMASK_MASK	0x0000ff00
+#define GENERIC_PMON_CTL_EDGE_DET	(1 << 18)
+#define GENERIC_PMON_CTL_INVERT		(1 << 23)
+#define GENERIC_PMON_CTL_TRESH_MASK	0xff000000
+#define GENERIC_PMON_RAW_EVENT_MASK	(GENERIC_PMON_CTL_EV_SEL_MASK | \
+					 GENERIC_PMON_CTL_UMASK_MASK | \
+					 GENERIC_PMON_CTL_EDGE_DET | \
+					 GENERIC_PMON_CTL_INVERT | \
+					 GENERIC_PMON_CTL_TRESH_MASK)
+
+#define GENERIC_PMON_BOX_CTL_FRZ	(1 << 0)
+#define GENERIC_PMON_BOX_CTL_RST_CTRL	(1 << 8)
+#define GENERIC_PMON_BOX_CTL_RST_CTRS	(1 << 9)
+#define GENERIC_PMON_BOX_CTL_INT	(GENERIC_PMON_BOX_CTL_RST_CTRL | \
+					 GENERIC_PMON_BOX_CTL_RST_CTRS)
+
 enum uncore_access_type {
 	UNCORE_ACCESS_MSR	= 0,
 	UNCORE_ACCESS_MMIO,
@@ -103,3 +120,4 @@ struct intel_uncore_discovery_type {
 
 bool intel_uncore_has_discovery_tables(void);
 void intel_uncore_clear_discovery_tables(void);
+void intel_uncore_generic_uncore_cpu_init(void);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* [tip: perf/core] perf/x86/intel/uncore: Parse uncore discovery tables
  2021-03-17 17:59 ` [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables kan.liang
  2021-03-19  1:10   ` Namhyung Kim
@ 2021-04-02  8:12   ` tip-bot2 for Kan Liang
  2022-07-22 12:55   ` [PATCH V2 1/5] " Lucas De Marchi
  2 siblings, 0 replies; 22+ messages in thread
From: tip-bot2 for Kan Liang @ 2021-04-02  8:12 UTC (permalink / raw)
  To: linux-tip-commits; +Cc: Kan Liang, Peter Zijlstra (Intel), x86, linux-kernel

The following commit has been merged into the perf/core branch of tip:

Commit-ID:     edae1f06c2cda41edffc93de6aedc8ba8dc883c3
Gitweb:        https://git.kernel.org/tip/edae1f06c2cda41edffc93de6aedc8ba8dc883c3
Author:        Kan Liang <kan.liang@linux.intel.com>
AuthorDate:    Wed, 17 Mar 2021 10:59:33 -07:00
Committer:     Peter Zijlstra <peterz@infradead.org>
CommitterDate: Fri, 02 Apr 2021 10:04:54 +02:00

perf/x86/intel/uncore: Parse uncore discovery tables

A self-describing mechanism for the uncore PerfMon hardware has been
introduced with the latest Intel platforms. By reading through an MMIO
page worth of information, perf can 'discover' all the standard uncore
PerfMon registers in a machine.

The discovery mechanism relies on BIOS's support. With a proper BIOS,
a PCI device with the unique capability ID 0x23 can be found on each
die. Perf can retrieve the information of all available uncore PerfMons
from the device via MMIO. The information is composed of one global
discovery table and several unit discovery tables.
- The global discovery table includes global uncore information of the
  die, e.g., the address of the global control register, the offset of
  the global status register, the number of uncore units, the offset of
  unit discovery tables, etc.
- The unit discovery table includes generic uncore unit information,
  e.g., the access type, the counter width, the address of counters,
  the address of the counter control, the unit ID, the unit type, etc.
  The unit is also called "box" in the code.
Perf can provide basic uncore support based on this information
with the following patches.

To locate the PCI device with the discovery tables, check the generic
PCI ID first. If it doesn't match, go through the entire PCI device tree
and locate the device with the unique capability ID.

The uncore information is similar among dies. To save parsing time and
space, only completely parse and store the discovery tables on the first
die and the first box of each die. The parsed information is stored in
an
RB tree structure, intel_uncore_discovery_type. The size of the stored
discovery tables varies among platforms. It's around 4KB for a Sapphire
Rapids server.

If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
will exit with -ENODEV. There is nothing changed.

Add a module parameter to disable the discovery feature. If a BIOS gets
the discovery tables wrong, users can have an option to disable the
feature. For the current patchset, the uncore driver will exit with
-ENODEV. In the future, it may fall back to the hardcode uncore driver
on a known platform.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
Link: https://lkml.kernel.org/r/1616003977-90612-2-git-send-email-kan.liang@linux.intel.com
---
 arch/x86/events/intel/Makefile           |   2 +-
 arch/x86/events/intel/uncore.c           |  31 +-
 arch/x86/events/intel/uncore_discovery.c | 318 ++++++++++++++++++++++-
 arch/x86/events/intel/uncore_discovery.h | 105 +++++++-
 4 files changed, 448 insertions(+), 8 deletions(-)
 create mode 100644 arch/x86/events/intel/uncore_discovery.c
 create mode 100644 arch/x86/events/intel/uncore_discovery.h

diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
index e67a588..10bde6c 100644
--- a/arch/x86/events/intel/Makefile
+++ b/arch/x86/events/intel/Makefile
@@ -3,6 +3,6 @@ obj-$(CONFIG_CPU_SUP_INTEL)		+= core.o bts.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= ds.o knc.o
 obj-$(CONFIG_CPU_SUP_INTEL)		+= lbr.o p4.o p6.o pt.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= intel-uncore.o
-intel-uncore-objs			:= uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o
+intel-uncore-objs			:= uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o uncore_discovery.o
 obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE)	+= intel-cstate.o
 intel-cstate-objs			:= cstate.o
diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 33c8180..d111370 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -4,7 +4,12 @@
 #include <asm/cpu_device_id.h>
 #include <asm/intel-family.h>
 #include "uncore.h"
+#include "uncore_discovery.h"
 
+static bool uncore_no_discover;
+module_param(uncore_no_discover, bool, 0);
+MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon discovery mechanism "
+				     "(default: enable the discovery mechanism).");
 static struct intel_uncore_type *empty_uncore[] = { NULL, };
 struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
 struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
@@ -1637,6 +1642,9 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 	.mmio_init = snr_uncore_mmio_init,
 };
 
+static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
+};
+
 static const struct x86_cpu_id intel_uncore_match[] __initconst = {
 	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_EP,		&nhm_uncore_init),
 	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM,		&nhm_uncore_init),
@@ -1684,17 +1692,21 @@ static int __init intel_uncore_init(void)
 	struct intel_uncore_init_fun *uncore_init;
 	int pret = 0, cret = 0, mret = 0, ret;
 
-	id = x86_match_cpu(intel_uncore_match);
-	if (!id)
-		return -ENODEV;
-
 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
 		return -ENODEV;
 
 	__uncore_max_dies =
 		topology_max_packages() * topology_max_die_per_package();
 
-	uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
+	id = x86_match_cpu(intel_uncore_match);
+	if (!id) {
+		if (!uncore_no_discover && intel_uncore_has_discovery_tables())
+			uncore_init = (struct intel_uncore_init_fun *)&generic_uncore_init;
+		else
+			return -ENODEV;
+	} else
+		uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
+
 	if (uncore_init->pci_init) {
 		pret = uncore_init->pci_init();
 		if (!pret)
@@ -1711,8 +1723,10 @@ static int __init intel_uncore_init(void)
 		mret = uncore_mmio_init();
 	}
 
-	if (cret && pret && mret)
-		return -ENODEV;
+	if (cret && pret && mret) {
+		ret = -ENODEV;
+		goto free_discovery;
+	}
 
 	/* Install hotplug callbacks to setup the targets for each package */
 	ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_UNCORE_ONLINE,
@@ -1727,6 +1741,8 @@ err:
 	uncore_types_exit(uncore_msr_uncores);
 	uncore_types_exit(uncore_mmio_uncores);
 	uncore_pci_exit();
+free_discovery:
+	intel_uncore_clear_discovery_tables();
 	return ret;
 }
 module_init(intel_uncore_init);
@@ -1737,5 +1753,6 @@ static void __exit intel_uncore_exit(void)
 	uncore_types_exit(uncore_msr_uncores);
 	uncore_types_exit(uncore_mmio_uncores);
 	uncore_pci_exit();
+	intel_uncore_clear_discovery_tables();
 }
 module_exit(intel_uncore_exit);
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
new file mode 100644
index 0000000..7519ce3
--- /dev/null
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -0,0 +1,318 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * Support Intel uncore PerfMon discovery mechanism.
+ * Copyright(c) 2021 Intel Corporation.
+ */
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include "uncore.h"
+#include "uncore_discovery.h"
+
+static struct rb_root discovery_tables = RB_ROOT;
+static int num_discovered_types[UNCORE_ACCESS_MAX];
+
+static bool has_generic_discovery_table(void)
+{
+	struct pci_dev *dev;
+	int dvsec;
+
+	dev = pci_get_device(PCI_VENDOR_ID_INTEL, UNCORE_DISCOVERY_TABLE_DEVICE, NULL);
+	if (!dev)
+		return false;
+
+	/* A discovery table device has the unique capability ID. */
+	dvsec = pci_find_next_ext_capability(dev, 0, UNCORE_EXT_CAP_ID_DISCOVERY);
+	pci_dev_put(dev);
+	if (dvsec)
+		return true;
+
+	return false;
+}
+
+static int logical_die_id;
+
+static int get_device_die_id(struct pci_dev *dev)
+{
+	int cpu, node = pcibus_to_node(dev->bus);
+
+	/*
+	 * If the NUMA info is not available, assume that the logical die id is
+	 * continuous in the order in which the discovery table devices are
+	 * detected.
+	 */
+	if (node < 0)
+		return logical_die_id++;
+
+	for_each_cpu(cpu, cpumask_of_node(node)) {
+		struct cpuinfo_x86 *c = &cpu_data(cpu);
+
+		if (c->initialized && cpu_to_node(cpu) == node)
+			return c->logical_die_id;
+	}
+
+	/*
+	 * All CPUs of a node may be offlined. For this case,
+	 * the PCI and MMIO type of uncore blocks which are
+	 * enumerated by the device will be unavailable.
+	 */
+	return -1;
+}
+
+#define __node_2_type(cur)	\
+	rb_entry((cur), struct intel_uncore_discovery_type, node)
+
+static inline int __type_cmp(const void *key, const struct rb_node *b)
+{
+	struct intel_uncore_discovery_type *type_b = __node_2_type(b);
+	const u16 *type_id = key;
+
+	if (type_b->type > *type_id)
+		return -1;
+	else if (type_b->type < *type_id)
+		return 1;
+
+	return 0;
+}
+
+static inline struct intel_uncore_discovery_type *
+search_uncore_discovery_type(u16 type_id)
+{
+	struct rb_node *node = rb_find(&type_id, &discovery_tables, __type_cmp);
+
+	return (node) ? __node_2_type(node) : NULL;
+}
+
+static inline bool __type_less(struct rb_node *a, const struct rb_node *b)
+{
+	return (__node_2_type(a)->type < __node_2_type(b)->type);
+}
+
+static struct intel_uncore_discovery_type *
+add_uncore_discovery_type(struct uncore_unit_discovery *unit)
+{
+	struct intel_uncore_discovery_type *type;
+
+	if (unit->access_type >= UNCORE_ACCESS_MAX) {
+		pr_warn("Unsupported access type %d\n", unit->access_type);
+		return NULL;
+	}
+
+	type = kzalloc(sizeof(struct intel_uncore_discovery_type), GFP_KERNEL);
+	if (!type)
+		return NULL;
+
+	type->box_ctrl_die = kcalloc(__uncore_max_dies, sizeof(u64), GFP_KERNEL);
+	if (!type->box_ctrl_die)
+		goto free_type;
+
+	type->access_type = unit->access_type;
+	num_discovered_types[type->access_type]++;
+	type->type = unit->box_type;
+
+	rb_add(&type->node, &discovery_tables, __type_less);
+
+	return type;
+
+free_type:
+	kfree(type);
+
+	return NULL;
+
+}
+
+static struct intel_uncore_discovery_type *
+get_uncore_discovery_type(struct uncore_unit_discovery *unit)
+{
+	struct intel_uncore_discovery_type *type;
+
+	type = search_uncore_discovery_type(unit->box_type);
+	if (type)
+		return type;
+
+	return add_uncore_discovery_type(unit);
+}
+
+static void
+uncore_insert_box_info(struct uncore_unit_discovery *unit,
+		       int die, bool parsed)
+{
+	struct intel_uncore_discovery_type *type;
+	unsigned int *box_offset, *ids;
+	int i;
+
+	if (WARN_ON_ONCE(!unit->ctl || !unit->ctl_offset || !unit->ctr_offset))
+		return;
+
+	if (parsed) {
+		type = search_uncore_discovery_type(unit->box_type);
+		if (WARN_ON_ONCE(!type))
+			return;
+		/* Store the first box of each die */
+		if (!type->box_ctrl_die[die])
+			type->box_ctrl_die[die] = unit->ctl;
+		return;
+	}
+
+	type = get_uncore_discovery_type(unit);
+	if (!type)
+		return;
+
+	box_offset = kcalloc(type->num_boxes + 1, sizeof(unsigned int), GFP_KERNEL);
+	if (!box_offset)
+		return;
+
+	ids = kcalloc(type->num_boxes + 1, sizeof(unsigned int), GFP_KERNEL);
+	if (!ids)
+		goto free_box_offset;
+
+	/* Store generic information for the first box */
+	if (!type->num_boxes) {
+		type->box_ctrl = unit->ctl;
+		type->box_ctrl_die[die] = unit->ctl;
+		type->num_counters = unit->num_regs;
+		type->counter_width = unit->bit_width;
+		type->ctl_offset = unit->ctl_offset;
+		type->ctr_offset = unit->ctr_offset;
+		*ids = unit->box_id;
+		goto end;
+	}
+
+	for (i = 0; i < type->num_boxes; i++) {
+		ids[i] = type->ids[i];
+		box_offset[i] = type->box_offset[i];
+
+		if (WARN_ON_ONCE(unit->box_id == ids[i]))
+			goto free_ids;
+	}
+	ids[i] = unit->box_id;
+	box_offset[i] = unit->ctl - type->box_ctrl;
+	kfree(type->ids);
+	kfree(type->box_offset);
+end:
+	type->ids = ids;
+	type->box_offset = box_offset;
+	type->num_boxes++;
+	return;
+
+free_ids:
+	kfree(ids);
+
+free_box_offset:
+	kfree(box_offset);
+
+}
+
+static int parse_discovery_table(struct pci_dev *dev, int die,
+				 u32 bar_offset, bool *parsed)
+{
+	struct uncore_global_discovery global;
+	struct uncore_unit_discovery unit;
+	void __iomem *io_addr;
+	resource_size_t addr;
+	unsigned long size;
+	u32 val;
+	int i;
+
+	pci_read_config_dword(dev, bar_offset, &val);
+
+	if (val & UNCORE_DISCOVERY_MASK)
+		return -EINVAL;
+
+	addr = (resource_size_t)(val & ~UNCORE_DISCOVERY_MASK);
+	size = UNCORE_DISCOVERY_GLOBAL_MAP_SIZE;
+	io_addr = ioremap(addr, size);
+	if (!io_addr)
+		return -ENOMEM;
+
+	/* Read Global Discovery State */
+	memcpy_fromio(&global, io_addr, sizeof(struct uncore_global_discovery));
+	if (uncore_discovery_invalid_unit(global)) {
+		pr_info("Invalid Global Discovery State: 0x%llx 0x%llx 0x%llx\n",
+			global.table1, global.ctl, global.table3);
+		iounmap(io_addr);
+		return -EINVAL;
+	}
+	iounmap(io_addr);
+
+	size = (1 + global.max_units) * global.stride * 8;
+	io_addr = ioremap(addr, size);
+	if (!io_addr)
+		return -ENOMEM;
+
+	/* Parsing Unit Discovery State */
+	for (i = 0; i < global.max_units; i++) {
+		memcpy_fromio(&unit, io_addr + (i + 1) * (global.stride * 8),
+			      sizeof(struct uncore_unit_discovery));
+
+		if (uncore_discovery_invalid_unit(unit))
+			continue;
+
+		if (unit.access_type >= UNCORE_ACCESS_MAX)
+			continue;
+
+		uncore_insert_box_info(&unit, die, *parsed);
+	}
+
+	*parsed = true;
+	iounmap(io_addr);
+	return 0;
+}
+
+bool intel_uncore_has_discovery_tables(void)
+{
+	u32 device, val, entry_id, bar_offset;
+	int die, dvsec = 0, ret = true;
+	struct pci_dev *dev = NULL;
+	bool parsed = false;
+
+	if (has_generic_discovery_table())
+		device = UNCORE_DISCOVERY_TABLE_DEVICE;
+	else
+		device = PCI_ANY_ID;
+
+	/*
+	 * Start a new search and iterates through the list of
+	 * the discovery table devices.
+	 */
+	while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) {
+		while ((dvsec = pci_find_next_ext_capability(dev, dvsec, UNCORE_EXT_CAP_ID_DISCOVERY))) {
+			pci_read_config_dword(dev, dvsec + UNCORE_DISCOVERY_DVSEC_OFFSET, &val);
+			entry_id = val & UNCORE_DISCOVERY_DVSEC_ID_MASK;
+			if (entry_id != UNCORE_DISCOVERY_DVSEC_ID_PMON)
+				continue;
+
+			pci_read_config_dword(dev, dvsec + UNCORE_DISCOVERY_DVSEC2_OFFSET, &val);
+
+			if (val & ~UNCORE_DISCOVERY_DVSEC2_BIR_MASK) {
+				ret = false;
+				goto err;
+			}
+			bar_offset = UNCORE_DISCOVERY_BIR_BASE +
+				     (val & UNCORE_DISCOVERY_DVSEC2_BIR_MASK) * UNCORE_DISCOVERY_BIR_STEP;
+
+			die = get_device_die_id(dev);
+			if (die < 0)
+				continue;
+
+			parse_discovery_table(dev, die, bar_offset, &parsed);
+		}
+	}
+
+	/* None of the discovery tables are available */
+	if (!parsed)
+		ret = false;
+err:
+	pci_dev_put(dev);
+
+	return ret;
+}
+
+void intel_uncore_clear_discovery_tables(void)
+{
+	struct intel_uncore_discovery_type *type, *next;
+
+	rbtree_postorder_for_each_entry_safe(type, next, &discovery_tables, node) {
+		kfree(type->box_ctrl_die);
+		kfree(type);
+	}
+}
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
new file mode 100644
index 0000000..95afa39
--- /dev/null
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -0,0 +1,105 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+
+/* Generic device ID of a discovery table device */
+#define UNCORE_DISCOVERY_TABLE_DEVICE		0x09a7
+/* Capability ID for a discovery table device */
+#define UNCORE_EXT_CAP_ID_DISCOVERY		0x23
+/* First DVSEC offset */
+#define UNCORE_DISCOVERY_DVSEC_OFFSET		0x8
+/* Mask of the supported discovery entry type */
+#define UNCORE_DISCOVERY_DVSEC_ID_MASK		0xffff
+/* PMON discovery entry type ID */
+#define UNCORE_DISCOVERY_DVSEC_ID_PMON		0x1
+/* Second DVSEC offset */
+#define UNCORE_DISCOVERY_DVSEC2_OFFSET		0xc
+/* Mask of the discovery table BAR offset */
+#define UNCORE_DISCOVERY_DVSEC2_BIR_MASK	0x7
+/* Discovery table BAR base offset */
+#define UNCORE_DISCOVERY_BIR_BASE		0x10
+/* Discovery table BAR step */
+#define UNCORE_DISCOVERY_BIR_STEP		0x4
+/* Mask of the discovery table offset */
+#define UNCORE_DISCOVERY_MASK			0xf
+/* Global discovery table size */
+#define UNCORE_DISCOVERY_GLOBAL_MAP_SIZE	0x20
+
+#define uncore_discovery_invalid_unit(unit)			\
+	(!unit.table1 || !unit.ctl || !unit.table3 ||	\
+	 unit.table1 == -1ULL || unit.ctl == -1ULL ||	\
+	 unit.table3 == -1ULL)
+
+enum uncore_access_type {
+	UNCORE_ACCESS_MSR	= 0,
+	UNCORE_ACCESS_MMIO,
+	UNCORE_ACCESS_PCI,
+
+	UNCORE_ACCESS_MAX,
+};
+
+struct uncore_global_discovery {
+	union {
+		u64	table1;
+		struct {
+			u64	type : 8,
+				stride : 8,
+				max_units : 10,
+				__reserved_1 : 36,
+				access_type : 2;
+		};
+	};
+
+	u64	ctl;		/* Global Control Address */
+
+	union {
+		u64	table3;
+		struct {
+			u64	status_offset : 8,
+				num_status : 16,
+				__reserved_2 : 40;
+		};
+	};
+};
+
+struct uncore_unit_discovery {
+	union {
+		u64	table1;
+		struct {
+			u64	num_regs : 8,
+				ctl_offset : 8,
+				bit_width : 8,
+				ctr_offset : 8,
+				status_offset : 8,
+				__reserved_1 : 22,
+				access_type : 2;
+			};
+		};
+
+	u64	ctl;		/* Unit Control Address */
+
+	union {
+		u64	table3;
+		struct {
+			u64	box_type : 16,
+				box_id : 16,
+				__reserved_2 : 32;
+		};
+	};
+};
+
+struct intel_uncore_discovery_type {
+	struct rb_node	node;
+	enum uncore_access_type	access_type;
+	u64		box_ctrl;	/* Unit ctrl addr of the first box */
+	u64		*box_ctrl_die;	/* Unit ctrl addr of the first box of each die */
+	u16		type;		/* Type ID of the uncore block */
+	u8		num_counters;
+	u8		counter_width;
+	u8		ctl_offset;	/* Counter Control 0 offset */
+	u8		ctr_offset;	/* Counter 0 offset */
+	u16		num_boxes;	/* number of boxes for the uncore block */
+	unsigned int	*ids;		/* Box IDs */
+	unsigned int	*box_offset;	/* Box offset */
+};
+
+bool intel_uncore_has_discovery_tables(void);
+void intel_uncore_clear_discovery_tables(void);

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2021-03-17 17:59 ` [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables kan.liang
  2021-03-19  1:10   ` Namhyung Kim
  2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
@ 2022-07-22 12:55   ` Lucas De Marchi
  2022-07-22 13:04     ` Liang, Kan
  2 siblings, 1 reply; 22+ messages in thread
From: Lucas De Marchi @ 2022-07-22 12:55 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, acme, linux-kernel, alexander.shishkin, jolsa,
	eranian, namhyung, ak, tilak.tangudu

Hi Kan,

On Wed, Mar 17, 2021 at 10:59:33AM -0700, kan.liang@linux.intel.com wrote:
>From: Kan Liang <kan.liang@linux.intel.com>
>
>A self-describing mechanism for the uncore PerfMon hardware has been
>introduced with the latest Intel platforms. By reading through an MMIO
>page worth of information, perf can 'discover' all the standard uncore
>PerfMon registers in a machine.
>
>The discovery mechanism relies on BIOS's support. With a proper BIOS,
>a PCI device with the unique capability ID 0x23 can be found on each
>die. Perf can retrieve the information of all available uncore PerfMons
>from the device via MMIO. The information is composed of one global
>discovery table and several unit discovery tables.
>- The global discovery table includes global uncore information of the
>  die, e.g., the address of the global control register, the offset of
>  the global status register, the number of uncore units, the offset of
>  unit discovery tables, etc.
>- The unit discovery table includes generic uncore unit information,
>  e.g., the access type, the counter width, the address of counters,
>  the address of the counter control, the unit ID, the unit type, etc.
>  The unit is also called "box" in the code.
>Perf can provide basic uncore support based on this information
>with the following patches.
>
>To locate the PCI device with the discovery tables, check the generic
>PCI ID first. If it doesn't match, go through the entire PCI device tree
>and locate the device with the unique capability ID.
>
>The uncore information is similar among dies. To save parsing time and
>space, only completely parse and store the discovery tables on the first
>die and the first box of each die. The parsed information is stored in
>an
>RB tree structure, intel_uncore_discovery_type. The size of the stored
>discovery tables varies among platforms. It's around 4KB for a Sapphire
>Rapids server.
>
>If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
>will exit with -ENODEV. There is nothing changed.
>
>Add a module parameter to disable the discovery feature. If a BIOS gets
>the discovery tables wrong, users can have an option to disable the
>feature. For the current patchset, the uncore driver will exit with
>-ENODEV. In the future, it may fall back to the hardcode uncore driver
>on a known platform.
>
>Signed-off-by: Kan Liang <kan.liang@linux.intel.com>

I observed one issue when upgrading a kernel from 5.10 to 5.15 and after
bisecting it arrived to this commit. I also verified the same issue is
present in 5.19-rc7 and that the issue is gone when booting with
intel_uncore.uncore_no_discover.

Test system is a SPR host with a PVC gpu. Issue is that PVC is not
reaching pkg c6 state, even if we put it in rc6 state. It seems the pcie
link is not idling, preventing it to go to pkg c6.

PMON discovery in bios is set to "auto".

We do see the following on dmesg while going through this code path:

	intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
	intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff 0xffffffffffffffff 0xffffffffffffffff
	intel_uncore: Uncore type 6 box 0: Invalid box control address.
	intel_uncore: Uncore type 6 box 1: Invalid box control address.
	intel_uncore: Uncore type 6 box 2: Invalid box control address.
	intel_uncore: Uncore type 6 box 3: Invalid box control address.
	intel_uncore: Uncore type 6 box 4: Invalid box control address.
	intel_uncore: Uncore type 6 box 5: Invalid box control address.
	intel_uncore: Uncore type 6 box 6: Invalid box control address.
	intel_uncore: Uncore type 6 box 7: Invalid box control address.

Any idea what could be going wrong here?

thanks
Lucas De Marchi

>---
> arch/x86/events/intel/Makefile           |   2 +-
> arch/x86/events/intel/uncore.c           |  31 ++-
> arch/x86/events/intel/uncore_discovery.c | 318 +++++++++++++++++++++++++++++++
> arch/x86/events/intel/uncore_discovery.h | 105 ++++++++++
> 4 files changed, 448 insertions(+), 8 deletions(-)
> create mode 100644 arch/x86/events/intel/uncore_discovery.c
> create mode 100644 arch/x86/events/intel/uncore_discovery.h
>
>diff --git a/arch/x86/events/intel/Makefile b/arch/x86/events/intel/Makefile
>index e67a588..10bde6c 100644
>--- a/arch/x86/events/intel/Makefile
>+++ b/arch/x86/events/intel/Makefile
>@@ -3,6 +3,6 @@ obj-$(CONFIG_CPU_SUP_INTEL)		+= core.o bts.o
> obj-$(CONFIG_CPU_SUP_INTEL)		+= ds.o knc.o
> obj-$(CONFIG_CPU_SUP_INTEL)		+= lbr.o p4.o p6.o pt.o
> obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)	+= intel-uncore.o
>-intel-uncore-objs			:= uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o
>+intel-uncore-objs			:= uncore.o uncore_nhmex.o uncore_snb.o uncore_snbep.o uncore_discovery.o
> obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE)	+= intel-cstate.o
> intel-cstate-objs			:= cstate.o
>diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
>index 33c8180..d111370 100644
>--- a/arch/x86/events/intel/uncore.c
>+++ b/arch/x86/events/intel/uncore.c
>@@ -4,7 +4,12 @@
> #include <asm/cpu_device_id.h>
> #include <asm/intel-family.h>
> #include "uncore.h"
>+#include "uncore_discovery.h"
>
>+static bool uncore_no_discover;
>+module_param(uncore_no_discover, bool, 0);
>+MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore PerfMon discovery mechanism "
>+				     "(default: enable the discovery mechanism).");
> static struct intel_uncore_type *empty_uncore[] = { NULL, };
> struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
> struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
>@@ -1637,6 +1642,9 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
> 	.mmio_init = snr_uncore_mmio_init,
> };
>
>+static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
>+};
>+
> static const struct x86_cpu_id intel_uncore_match[] __initconst = {
> 	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_EP,		&nhm_uncore_init),
> 	X86_MATCH_INTEL_FAM6_MODEL(NEHALEM,		&nhm_uncore_init),
>@@ -1684,17 +1692,21 @@ static int __init intel_uncore_init(void)
> 	struct intel_uncore_init_fun *uncore_init;
> 	int pret = 0, cret = 0, mret = 0, ret;
>
>-	id = x86_match_cpu(intel_uncore_match);
>-	if (!id)
>-		return -ENODEV;
>-
> 	if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
> 		return -ENODEV;
>
> 	__uncore_max_dies =
> 		topology_max_packages() * topology_max_die_per_package();
>
>-	uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
>+	id = x86_match_cpu(intel_uncore_match);
>+	if (!id) {
>+		if (!uncore_no_discover && intel_uncore_has_discovery_tables())
>+			uncore_init = (struct intel_uncore_init_fun *)&generic_uncore_init;
>+		else
>+			return -ENODEV;
>+	} else
>+		uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
>+
> 	if (uncore_init->pci_init) {
> 		pret = uncore_init->pci_init();
> 		if (!pret)
>@@ -1711,8 +1723,10 @@ static int __init intel_uncore_init(void)
> 		mret = uncore_mmio_init();
> 	}
>
>-	if (cret && pret && mret)
>-		return -ENODEV;
>+	if (cret && pret && mret) {
>+		ret = -ENODEV;
>+		goto free_discovery;
>+	}
>
> 	/* Install hotplug callbacks to setup the targets for each package */
> 	ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_UNCORE_ONLINE,
>@@ -1727,6 +1741,8 @@ static int __init intel_uncore_init(void)
> 	uncore_types_exit(uncore_msr_uncores);
> 	uncore_types_exit(uncore_mmio_uncores);
> 	uncore_pci_exit();
>+free_discovery:
>+	intel_uncore_clear_discovery_tables();
> 	return ret;
> }
> module_init(intel_uncore_init);
>@@ -1737,5 +1753,6 @@ static void __exit intel_uncore_exit(void)
> 	uncore_types_exit(uncore_msr_uncores);
> 	uncore_types_exit(uncore_mmio_uncores);
> 	uncore_pci_exit();
>+	intel_uncore_clear_discovery_tables();
> }
> module_exit(intel_uncore_exit);
>diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
>new file mode 100644
>index 0000000..9d5c8b2
>--- /dev/null
>+++ b/arch/x86/events/intel/uncore_discovery.c
>@@ -0,0 +1,318 @@
>+/* SPDX-License-Identifier: GPL-2.0-only */
>+/*
>+ * Support Intel uncore PerfMon discovery mechanism.
>+ * Copyright(c) 2021 Intel Corporation.
>+ */
>+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>+
>+#include "uncore.h"
>+#include "uncore_discovery.h"
>+
>+static struct rb_root discovery_tables = RB_ROOT;
>+static int num_discovered_types[UNCORE_ACCESS_MAX];
>+
>+static bool has_generic_discovery_table(void)
>+{
>+	struct pci_dev *dev;
>+	int dvsec;
>+
>+	dev = pci_get_device(PCI_VENDOR_ID_INTEL, UNCORE_DISCOVERY_TABLE_DEVICE, NULL);
>+	if (!dev)
>+		return false;
>+
>+	/* A discovery table device has the unique capability ID. */
>+	dvsec = pci_find_next_ext_capability(dev, 0, UNCORE_EXT_CAP_ID_DISCOVERY);
>+	pci_dev_put(dev);
>+	if (dvsec)
>+		return true;
>+
>+	return false;
>+}
>+
>+static int logical_die_id;
>+
>+static int get_device_die_id(struct pci_dev *dev)
>+{
>+	int cpu, node = pcibus_to_node(dev->bus);
>+
>+	/*
>+	 * If the NUMA info is not available, assume that the logical die id is
>+	 * continuous in the order in which the discovery table devices are
>+	 * detected.
>+	 */
>+	if (node < 0)
>+		return logical_die_id++;
>+
>+	for_each_cpu(cpu, cpumask_of_node(node)) {
>+		struct cpuinfo_x86 *c = &cpu_data(cpu);
>+
>+		if (c->initialized && cpu_to_node(cpu) == node)
>+			return c->logical_die_id;
>+	}
>+
>+	/*
>+	 * All CPUs of a node may be offlined. For this case,
>+	 * the PCI and MMIO type of uncore blocks which are
>+	 * enumerated by the device will be unavailable.
>+	 */
>+	return -1;
>+}
>+
>+#define __node_2_type(cur)	\
>+	rb_entry((cur), struct intel_uncore_discovery_type, node)
>+
>+static inline int __type_cmp(const void *key, const struct rb_node *b)
>+{
>+	struct intel_uncore_discovery_type *type_b = __node_2_type(b);
>+	const u16 *type_id = key;
>+
>+	if (type_b->type > *type_id)
>+		return -1;
>+	else if (type_b->type < *type_id)
>+		return 1;
>+
>+	return 0;
>+}
>+
>+static inline struct intel_uncore_discovery_type *
>+search_uncore_discovery_type(u16 type_id)
>+{
>+	struct rb_node *node = rb_find(&type_id, &discovery_tables, __type_cmp);
>+
>+	return (node) ? __node_2_type(node) : NULL;
>+}
>+
>+static inline bool __type_less(struct rb_node *a, const struct rb_node *b)
>+{
>+	return (__node_2_type(a)->type < __node_2_type(b)->type) ? true : false;
>+}
>+
>+static struct intel_uncore_discovery_type *
>+add_uncore_discovery_type(struct uncore_unit_discovery *unit)
>+{
>+	struct intel_uncore_discovery_type *type;
>+
>+	if (unit->access_type >= UNCORE_ACCESS_MAX) {
>+		pr_warn("Unsupported access type %d\n", unit->access_type);
>+		return NULL;
>+	}
>+
>+	type = kzalloc(sizeof(struct intel_uncore_discovery_type), GFP_KERNEL);
>+	if (!type)
>+		return NULL;
>+
>+	type->box_ctrl_die = kcalloc(__uncore_max_dies, sizeof(u64), GFP_KERNEL);
>+	if (!type->box_ctrl_die)
>+		goto free_type;
>+
>+	type->access_type = unit->access_type;
>+	num_discovered_types[type->access_type]++;
>+	type->type = unit->box_type;
>+
>+	rb_add(&type->node, &discovery_tables, __type_less);
>+
>+	return type;
>+
>+free_type:
>+	kfree(type);
>+
>+	return NULL;
>+
>+}
>+
>+static struct intel_uncore_discovery_type *
>+get_uncore_discovery_type(struct uncore_unit_discovery *unit)
>+{
>+	struct intel_uncore_discovery_type *type;
>+
>+	type = search_uncore_discovery_type(unit->box_type);
>+	if (type)
>+		return type;
>+
>+	return add_uncore_discovery_type(unit);
>+}
>+
>+static void
>+uncore_insert_box_info(struct uncore_unit_discovery *unit,
>+		       int die, bool parsed)
>+{
>+	struct intel_uncore_discovery_type *type;
>+	unsigned int *box_offset, *ids;
>+	int i;
>+
>+	if (WARN_ON_ONCE(!unit->ctl || !unit->ctl_offset || !unit->ctr_offset))
>+		return;
>+
>+	if (parsed) {
>+		type = search_uncore_discovery_type(unit->box_type);
>+		if (WARN_ON_ONCE(!type))
>+			return;
>+		/* Store the first box of each die */
>+		if (!type->box_ctrl_die[die])
>+			type->box_ctrl_die[die] = unit->ctl;
>+		return;
>+	}
>+
>+	type = get_uncore_discovery_type(unit);
>+	if (!type)
>+		return;
>+
>+	box_offset = kcalloc(type->num_boxes + 1, sizeof(unsigned int), GFP_KERNEL);
>+	if (!box_offset)
>+		return;
>+
>+	ids = kcalloc(type->num_boxes + 1, sizeof(unsigned int), GFP_KERNEL);
>+	if (!ids)
>+		goto free_box_offset;
>+
>+	/* Store generic information for the first box */
>+	if (!type->num_boxes) {
>+		type->box_ctrl = unit->ctl;
>+		type->box_ctrl_die[die] = unit->ctl;
>+		type->num_counters = unit->num_regs;
>+		type->counter_width = unit->bit_width;
>+		type->ctl_offset = unit->ctl_offset;
>+		type->ctr_offset = unit->ctr_offset;
>+		*ids = unit->box_id;
>+		goto end;
>+	}
>+
>+	for (i = 0; i < type->num_boxes; i++) {
>+		ids[i] = type->ids[i];
>+		box_offset[i] = type->box_offset[i];
>+
>+		if (WARN_ON_ONCE(unit->box_id == ids[i]))
>+			goto free_ids;
>+	}
>+	ids[i] = unit->box_id;
>+	box_offset[i] = unit->ctl - type->box_ctrl;
>+	kfree(type->ids);
>+	kfree(type->box_offset);
>+end:
>+	type->ids = ids;
>+	type->box_offset = box_offset;
>+	type->num_boxes++;
>+	return;
>+
>+free_ids:
>+	kfree(ids);
>+
>+free_box_offset:
>+	kfree(box_offset);
>+
>+}
>+
>+static int parse_discovery_table(struct pci_dev *dev, int die,
>+				 u32 bar_offset, bool *parsed)
>+{
>+	struct uncore_global_discovery global;
>+	struct uncore_unit_discovery unit;
>+	void __iomem *io_addr;
>+	resource_size_t addr;
>+	unsigned long size;
>+	u32 val;
>+	int i;
>+
>+	pci_read_config_dword(dev, bar_offset, &val);
>+
>+	if (val & UNCORE_DISCOVERY_MASK)
>+		return -EINVAL;
>+
>+	addr = (resource_size_t)(val & ~UNCORE_DISCOVERY_MASK);
>+	size = UNCORE_DISCOVERY_GLOBAL_MAP_SIZE;
>+	io_addr = ioremap(addr, size);
>+	if (!io_addr)
>+		return -ENOMEM;
>+
>+	/* Read Global Discovery State */
>+	memcpy_fromio(&global, io_addr, sizeof(struct uncore_global_discovery));
>+	if (uncore_discovery_invalid_unit(global)) {
>+		pr_info("Invalid Global Discovery State: 0x%llx 0x%llx 0x%llx\n",
>+			global.table1, global.ctl, global.table3);
>+		iounmap(io_addr);
>+		return -EINVAL;
>+	}
>+	iounmap(io_addr);
>+
>+	size = (1 + global.max_units) * global.stride * 8;
>+	io_addr = ioremap(addr, size);
>+	if (!io_addr)
>+		return -ENOMEM;
>+
>+	/* Parsing Unit Discovery State */
>+	for (i = 0; i < global.max_units; i++) {
>+		memcpy_fromio(&unit, io_addr + (i + 1) * (global.stride * 8),
>+			      sizeof(struct uncore_unit_discovery));
>+
>+		if (uncore_discovery_invalid_unit(unit))
>+			continue;
>+
>+		if (unit.access_type >= UNCORE_ACCESS_MAX)
>+			continue;
>+
>+		uncore_insert_box_info(&unit, die, *parsed);
>+	}
>+
>+	*parsed = true;
>+	iounmap(io_addr);
>+	return 0;
>+}
>+
>+bool intel_uncore_has_discovery_tables(void)
>+{
>+	u32 device, val, entry_id, bar_offset;
>+	int die, dvsec = 0, ret = true;
>+	struct pci_dev *dev = NULL;
>+	bool parsed = false;
>+
>+	if (has_generic_discovery_table())
>+		device = UNCORE_DISCOVERY_TABLE_DEVICE;
>+	else
>+		device = PCI_ANY_ID;
>+
>+	/*
>+	 * Start a new search and iterates through the list of
>+	 * the discovery table devices.
>+	 */
>+	while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev)) != NULL) {
>+		while ((dvsec = pci_find_next_ext_capability(dev, dvsec, UNCORE_EXT_CAP_ID_DISCOVERY))) {
>+			pci_read_config_dword(dev, dvsec + UNCORE_DISCOVERY_DVSEC_OFFSET, &val);
>+			entry_id = val & UNCORE_DISCOVERY_DVSEC_ID_MASK;
>+			if (entry_id != UNCORE_DISCOVERY_DVSEC_ID_PMON)
>+				continue;
>+
>+			pci_read_config_dword(dev, dvsec + UNCORE_DISCOVERY_DVSEC2_OFFSET, &val);
>+
>+			if (val & ~UNCORE_DISCOVERY_DVSEC2_BIR_MASK) {
>+				ret = false;
>+				goto err;
>+			}
>+			bar_offset = UNCORE_DISCOVERY_BIR_BASE +
>+				     (val & UNCORE_DISCOVERY_DVSEC2_BIR_MASK) * UNCORE_DISCOVERY_BIR_STEP;
>+
>+			die = get_device_die_id(dev);
>+			if (die < 0)
>+				continue;
>+
>+			parse_discovery_table(dev, die, bar_offset, &parsed);
>+		}
>+	}
>+
>+	/* None of the discovery tables are available */
>+	if (!parsed)
>+		ret = false;
>+err:
>+	pci_dev_put(dev);
>+
>+	return ret;
>+}
>+
>+void intel_uncore_clear_discovery_tables(void)
>+{
>+	struct intel_uncore_discovery_type *type, *next;
>+
>+	rbtree_postorder_for_each_entry_safe(type, next, &discovery_tables, node) {
>+		kfree(type->box_ctrl_die);
>+		kfree(type);
>+	}
>+}
>diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
>new file mode 100644
>index 0000000..95afa39
>--- /dev/null
>+++ b/arch/x86/events/intel/uncore_discovery.h
>@@ -0,0 +1,105 @@
>+/* SPDX-License-Identifier: GPL-2.0-only */
>+
>+/* Generic device ID of a discovery table device */
>+#define UNCORE_DISCOVERY_TABLE_DEVICE		0x09a7
>+/* Capability ID for a discovery table device */
>+#define UNCORE_EXT_CAP_ID_DISCOVERY		0x23
>+/* First DVSEC offset */
>+#define UNCORE_DISCOVERY_DVSEC_OFFSET		0x8
>+/* Mask of the supported discovery entry type */
>+#define UNCORE_DISCOVERY_DVSEC_ID_MASK		0xffff
>+/* PMON discovery entry type ID */
>+#define UNCORE_DISCOVERY_DVSEC_ID_PMON		0x1
>+/* Second DVSEC offset */
>+#define UNCORE_DISCOVERY_DVSEC2_OFFSET		0xc
>+/* Mask of the discovery table BAR offset */
>+#define UNCORE_DISCOVERY_DVSEC2_BIR_MASK	0x7
>+/* Discovery table BAR base offset */
>+#define UNCORE_DISCOVERY_BIR_BASE		0x10
>+/* Discovery table BAR step */
>+#define UNCORE_DISCOVERY_BIR_STEP		0x4
>+/* Mask of the discovery table offset */
>+#define UNCORE_DISCOVERY_MASK			0xf
>+/* Global discovery table size */
>+#define UNCORE_DISCOVERY_GLOBAL_MAP_SIZE	0x20
>+
>+#define uncore_discovery_invalid_unit(unit)			\
>+	(!unit.table1 || !unit.ctl || !unit.table3 ||	\
>+	 unit.table1 == -1ULL || unit.ctl == -1ULL ||	\
>+	 unit.table3 == -1ULL)
>+
>+enum uncore_access_type {
>+	UNCORE_ACCESS_MSR	= 0,
>+	UNCORE_ACCESS_MMIO,
>+	UNCORE_ACCESS_PCI,
>+
>+	UNCORE_ACCESS_MAX,
>+};
>+
>+struct uncore_global_discovery {
>+	union {
>+		u64	table1;
>+		struct {
>+			u64	type : 8,
>+				stride : 8,
>+				max_units : 10,
>+				__reserved_1 : 36,
>+				access_type : 2;
>+		};
>+	};
>+
>+	u64	ctl;		/* Global Control Address */
>+
>+	union {
>+		u64	table3;
>+		struct {
>+			u64	status_offset : 8,
>+				num_status : 16,
>+				__reserved_2 : 40;
>+		};
>+	};
>+};
>+
>+struct uncore_unit_discovery {
>+	union {
>+		u64	table1;
>+		struct {
>+			u64	num_regs : 8,
>+				ctl_offset : 8,
>+				bit_width : 8,
>+				ctr_offset : 8,
>+				status_offset : 8,
>+				__reserved_1 : 22,
>+				access_type : 2;
>+			};
>+		};
>+
>+	u64	ctl;		/* Unit Control Address */
>+
>+	union {
>+		u64	table3;
>+		struct {
>+			u64	box_type : 16,
>+				box_id : 16,
>+				__reserved_2 : 32;
>+		};
>+	};
>+};
>+
>+struct intel_uncore_discovery_type {
>+	struct rb_node	node;
>+	enum uncore_access_type	access_type;
>+	u64		box_ctrl;	/* Unit ctrl addr of the first box */
>+	u64		*box_ctrl_die;	/* Unit ctrl addr of the first box of each die */
>+	u16		type;		/* Type ID of the uncore block */
>+	u8		num_counters;
>+	u8		counter_width;
>+	u8		ctl_offset;	/* Counter Control 0 offset */
>+	u8		ctr_offset;	/* Counter 0 offset */
>+	u16		num_boxes;	/* number of boxes for the uncore block */
>+	unsigned int	*ids;		/* Box IDs */
>+	unsigned int	*box_offset;	/* Box offset */
>+};
>+
>+bool intel_uncore_has_discovery_tables(void);
>+void intel_uncore_clear_discovery_tables(void);
>-- 
>2.7.4
>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2022-07-22 12:55   ` [PATCH V2 1/5] " Lucas De Marchi
@ 2022-07-22 13:04     ` Liang, Kan
  2022-07-23 18:56       ` Lucas De Marchi
  0 siblings, 1 reply; 22+ messages in thread
From: Liang, Kan @ 2022-07-22 13:04 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: peterz, mingo, acme, linux-kernel, alexander.shishkin, jolsa,
	eranian, namhyung, ak, tilak.tangudu



On 2022-07-22 8:55 a.m., Lucas De Marchi wrote:
> Hi Kan,
> 
> On Wed, Mar 17, 2021 at 10:59:33AM -0700, kan.liang@linux.intel.com wrote:
>> From: Kan Liang <kan.liang@linux.intel.com>
>>
>> A self-describing mechanism for the uncore PerfMon hardware has been
>> introduced with the latest Intel platforms. By reading through an MMIO
>> page worth of information, perf can 'discover' all the standard uncore
>> PerfMon registers in a machine.
>>
>> The discovery mechanism relies on BIOS's support. With a proper BIOS,
>> a PCI device with the unique capability ID 0x23 can be found on each
>> die. Perf can retrieve the information of all available uncore PerfMons
>> from the device via MMIO. The information is composed of one global
>> discovery table and several unit discovery tables.
>> - The global discovery table includes global uncore information of the
>>  die, e.g., the address of the global control register, the offset of
>>  the global status register, the number of uncore units, the offset of
>>  unit discovery tables, etc.
>> - The unit discovery table includes generic uncore unit information,
>>  e.g., the access type, the counter width, the address of counters,
>>  the address of the counter control, the unit ID, the unit type, etc.
>>  The unit is also called "box" in the code.
>> Perf can provide basic uncore support based on this information
>> with the following patches.
>>
>> To locate the PCI device with the discovery tables, check the generic
>> PCI ID first. If it doesn't match, go through the entire PCI device tree
>> and locate the device with the unique capability ID.
>>
>> The uncore information is similar among dies. To save parsing time and
>> space, only completely parse and store the discovery tables on the first
>> die and the first box of each die. The parsed information is stored in
>> an
>> RB tree structure, intel_uncore_discovery_type. The size of the stored
>> discovery tables varies among platforms. It's around 4KB for a Sapphire
>> Rapids server.
>>
>> If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
>> will exit with -ENODEV. There is nothing changed.
>>
>> Add a module parameter to disable the discovery feature. If a BIOS gets
>> the discovery tables wrong, users can have an option to disable the
>> feature. For the current patchset, the uncore driver will exit with
>> -ENODEV. In the future, it may fall back to the hardcode uncore driver
>> on a known platform.
>>
>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> 
> I observed one issue when upgrading a kernel from 5.10 to 5.15 and after
> bisecting it arrived to this commit. I also verified the same issue is
> present in 5.19-rc7 and that the issue is gone when booting with
> intel_uncore.uncore_no_discover.
> 
> Test system is a SPR host with a PVC gpu. Issue is that PVC is not
> reaching pkg c6 state, even if we put it in rc6 state. It seems the pcie
> link is not idling, preventing it to go to pkg c6.
> 
> PMON discovery in bios is set to "auto".
> 
> We do see the following on dmesg while going through this code path:
> 
>     intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff
> 0xffffffffffffffff 0xffffffffffffffff

On SPR, the uncore driver relies on the discovery table provided by the
BIOS/firmware. It looks like your BIOS/firmware is out of date. Could
you please update to the latest BIOS/firmware and have a try?

Thanks,
Kan

>     intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff
> 0xffffffffffffffff 0xffffffffffffffff
>     intel_uncore: Uncore type 6 box 0: Invalid box control address.
>     intel_uncore: Uncore type 6 box 1: Invalid box control address.
>     intel_uncore: Uncore type 6 box 2: Invalid box control address.
>     intel_uncore: Uncore type 6 box 3: Invalid box control address.
>     intel_uncore: Uncore type 6 box 4: Invalid box control address.
>     intel_uncore: Uncore type 6 box 5: Invalid box control address.
>     intel_uncore: Uncore type 6 box 6: Invalid box control address.
>     intel_uncore: Uncore type 6 box 7: Invalid box control address.
> 
> Any idea what could be going wrong here?
> 
> thanks
> Lucas De Marchi
> 
>> ---
>> arch/x86/events/intel/Makefile           |   2 +-
>> arch/x86/events/intel/uncore.c           |  31 ++-
>> arch/x86/events/intel/uncore_discovery.c | 318
>> +++++++++++++++++++++++++++++++
>> arch/x86/events/intel/uncore_discovery.h | 105 ++++++++++
>> 4 files changed, 448 insertions(+), 8 deletions(-)
>> create mode 100644 arch/x86/events/intel/uncore_discovery.c
>> create mode 100644 arch/x86/events/intel/uncore_discovery.h
>>
>> diff --git a/arch/x86/events/intel/Makefile
>> b/arch/x86/events/intel/Makefile
>> index e67a588..10bde6c 100644
>> --- a/arch/x86/events/intel/Makefile
>> +++ b/arch/x86/events/intel/Makefile
>> @@ -3,6 +3,6 @@ obj-$(CONFIG_CPU_SUP_INTEL)        += core.o bts.o
>> obj-$(CONFIG_CPU_SUP_INTEL)        += ds.o knc.o
>> obj-$(CONFIG_CPU_SUP_INTEL)        += lbr.o p4.o p6.o pt.o
>> obj-$(CONFIG_PERF_EVENTS_INTEL_UNCORE)    += intel-uncore.o
>> -intel-uncore-objs            := uncore.o uncore_nhmex.o uncore_snb.o
>> uncore_snbep.o
>> +intel-uncore-objs            := uncore.o uncore_nhmex.o uncore_snb.o
>> uncore_snbep.o uncore_discovery.o
>> obj-$(CONFIG_PERF_EVENTS_INTEL_CSTATE)    += intel-cstate.o
>> intel-cstate-objs            := cstate.o
>> diff --git a/arch/x86/events/intel/uncore.c
>> b/arch/x86/events/intel/uncore.c
>> index 33c8180..d111370 100644
>> --- a/arch/x86/events/intel/uncore.c
>> +++ b/arch/x86/events/intel/uncore.c
>> @@ -4,7 +4,12 @@
>> #include <asm/cpu_device_id.h>
>> #include <asm/intel-family.h>
>> #include "uncore.h"
>> +#include "uncore_discovery.h"
>>
>> +static bool uncore_no_discover;
>> +module_param(uncore_no_discover, bool, 0);
>> +MODULE_PARM_DESC(uncore_no_discover, "Don't enable the Intel uncore
>> PerfMon discovery mechanism "
>> +                     "(default: enable the discovery mechanism).");
>> static struct intel_uncore_type *empty_uncore[] = { NULL, };
>> struct intel_uncore_type **uncore_msr_uncores = empty_uncore;
>> struct intel_uncore_type **uncore_pci_uncores = empty_uncore;
>> @@ -1637,6 +1642,9 @@ static const struct intel_uncore_init_fun
>> snr_uncore_init __initconst = {
>>     .mmio_init = snr_uncore_mmio_init,
>> };
>>
>> +static const struct intel_uncore_init_fun generic_uncore_init
>> __initconst = {
>> +};
>> +
>> static const struct x86_cpu_id intel_uncore_match[] __initconst = {
>>     X86_MATCH_INTEL_FAM6_MODEL(NEHALEM_EP,        &nhm_uncore_init),
>>     X86_MATCH_INTEL_FAM6_MODEL(NEHALEM,        &nhm_uncore_init),
>> @@ -1684,17 +1692,21 @@ static int __init intel_uncore_init(void)
>>     struct intel_uncore_init_fun *uncore_init;
>>     int pret = 0, cret = 0, mret = 0, ret;
>>
>> -    id = x86_match_cpu(intel_uncore_match);
>> -    if (!id)
>> -        return -ENODEV;
>> -
>>     if (boot_cpu_has(X86_FEATURE_HYPERVISOR))
>>         return -ENODEV;
>>
>>     __uncore_max_dies =
>>         topology_max_packages() * topology_max_die_per_package();
>>
>> -    uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
>> +    id = x86_match_cpu(intel_uncore_match);
>> +    if (!id) {
>> +        if (!uncore_no_discover && intel_uncore_has_discovery_tables())
>> +            uncore_init = (struct intel_uncore_init_fun
>> *)&generic_uncore_init;
>> +        else
>> +            return -ENODEV;
>> +    } else
>> +        uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
>> +
>>     if (uncore_init->pci_init) {
>>         pret = uncore_init->pci_init();
>>         if (!pret)
>> @@ -1711,8 +1723,10 @@ static int __init intel_uncore_init(void)
>>         mret = uncore_mmio_init();
>>     }
>>
>> -    if (cret && pret && mret)
>> -        return -ENODEV;
>> +    if (cret && pret && mret) {
>> +        ret = -ENODEV;
>> +        goto free_discovery;
>> +    }
>>
>>     /* Install hotplug callbacks to setup the targets for each package */
>>     ret = cpuhp_setup_state(CPUHP_AP_PERF_X86_UNCORE_ONLINE,
>> @@ -1727,6 +1741,8 @@ static int __init intel_uncore_init(void)
>>     uncore_types_exit(uncore_msr_uncores);
>>     uncore_types_exit(uncore_mmio_uncores);
>>     uncore_pci_exit();
>> +free_discovery:
>> +    intel_uncore_clear_discovery_tables();
>>     return ret;
>> }
>> module_init(intel_uncore_init);
>> @@ -1737,5 +1753,6 @@ static void __exit intel_uncore_exit(void)
>>     uncore_types_exit(uncore_msr_uncores);
>>     uncore_types_exit(uncore_mmio_uncores);
>>     uncore_pci_exit();
>> +    intel_uncore_clear_discovery_tables();
>> }
>> module_exit(intel_uncore_exit);
>> diff --git a/arch/x86/events/intel/uncore_discovery.c
>> b/arch/x86/events/intel/uncore_discovery.c
>> new file mode 100644
>> index 0000000..9d5c8b2
>> --- /dev/null
>> +++ b/arch/x86/events/intel/uncore_discovery.c
>> @@ -0,0 +1,318 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +/*
>> + * Support Intel uncore PerfMon discovery mechanism.
>> + * Copyright(c) 2021 Intel Corporation.
>> + */
>> +#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
>> +
>> +#include "uncore.h"
>> +#include "uncore_discovery.h"
>> +
>> +static struct rb_root discovery_tables = RB_ROOT;
>> +static int num_discovered_types[UNCORE_ACCESS_MAX];
>> +
>> +static bool has_generic_discovery_table(void)
>> +{
>> +    struct pci_dev *dev;
>> +    int dvsec;
>> +
>> +    dev = pci_get_device(PCI_VENDOR_ID_INTEL,
>> UNCORE_DISCOVERY_TABLE_DEVICE, NULL);
>> +    if (!dev)
>> +        return false;
>> +
>> +    /* A discovery table device has the unique capability ID. */
>> +    dvsec = pci_find_next_ext_capability(dev, 0,
>> UNCORE_EXT_CAP_ID_DISCOVERY);
>> +    pci_dev_put(dev);
>> +    if (dvsec)
>> +        return true;
>> +
>> +    return false;
>> +}
>> +
>> +static int logical_die_id;
>> +
>> +static int get_device_die_id(struct pci_dev *dev)
>> +{
>> +    int cpu, node = pcibus_to_node(dev->bus);
>> +
>> +    /*
>> +     * If the NUMA info is not available, assume that the logical die
>> id is
>> +     * continuous in the order in which the discovery table devices are
>> +     * detected.
>> +     */
>> +    if (node < 0)
>> +        return logical_die_id++;
>> +
>> +    for_each_cpu(cpu, cpumask_of_node(node)) {
>> +        struct cpuinfo_x86 *c = &cpu_data(cpu);
>> +
>> +        if (c->initialized && cpu_to_node(cpu) == node)
>> +            return c->logical_die_id;
>> +    }
>> +
>> +    /*
>> +     * All CPUs of a node may be offlined. For this case,
>> +     * the PCI and MMIO type of uncore blocks which are
>> +     * enumerated by the device will be unavailable.
>> +     */
>> +    return -1;
>> +}
>> +
>> +#define __node_2_type(cur)    \
>> +    rb_entry((cur), struct intel_uncore_discovery_type, node)
>> +
>> +static inline int __type_cmp(const void *key, const struct rb_node *b)
>> +{
>> +    struct intel_uncore_discovery_type *type_b = __node_2_type(b);
>> +    const u16 *type_id = key;
>> +
>> +    if (type_b->type > *type_id)
>> +        return -1;
>> +    else if (type_b->type < *type_id)
>> +        return 1;
>> +
>> +    return 0;
>> +}
>> +
>> +static inline struct intel_uncore_discovery_type *
>> +search_uncore_discovery_type(u16 type_id)
>> +{
>> +    struct rb_node *node = rb_find(&type_id, &discovery_tables,
>> __type_cmp);
>> +
>> +    return (node) ? __node_2_type(node) : NULL;
>> +}
>> +
>> +static inline bool __type_less(struct rb_node *a, const struct
>> rb_node *b)
>> +{
>> +    return (__node_2_type(a)->type < __node_2_type(b)->type) ? true :
>> false;
>> +}
>> +
>> +static struct intel_uncore_discovery_type *
>> +add_uncore_discovery_type(struct uncore_unit_discovery *unit)
>> +{
>> +    struct intel_uncore_discovery_type *type;
>> +
>> +    if (unit->access_type >= UNCORE_ACCESS_MAX) {
>> +        pr_warn("Unsupported access type %d\n", unit->access_type);
>> +        return NULL;
>> +    }
>> +
>> +    type = kzalloc(sizeof(struct intel_uncore_discovery_type),
>> GFP_KERNEL);
>> +    if (!type)
>> +        return NULL;
>> +
>> +    type->box_ctrl_die = kcalloc(__uncore_max_dies, sizeof(u64),
>> GFP_KERNEL);
>> +    if (!type->box_ctrl_die)
>> +        goto free_type;
>> +
>> +    type->access_type = unit->access_type;
>> +    num_discovered_types[type->access_type]++;
>> +    type->type = unit->box_type;
>> +
>> +    rb_add(&type->node, &discovery_tables, __type_less);
>> +
>> +    return type;
>> +
>> +free_type:
>> +    kfree(type);
>> +
>> +    return NULL;
>> +
>> +}
>> +
>> +static struct intel_uncore_discovery_type *
>> +get_uncore_discovery_type(struct uncore_unit_discovery *unit)
>> +{
>> +    struct intel_uncore_discovery_type *type;
>> +
>> +    type = search_uncore_discovery_type(unit->box_type);
>> +    if (type)
>> +        return type;
>> +
>> +    return add_uncore_discovery_type(unit);
>> +}
>> +
>> +static void
>> +uncore_insert_box_info(struct uncore_unit_discovery *unit,
>> +               int die, bool parsed)
>> +{
>> +    struct intel_uncore_discovery_type *type;
>> +    unsigned int *box_offset, *ids;
>> +    int i;
>> +
>> +    if (WARN_ON_ONCE(!unit->ctl || !unit->ctl_offset ||
>> !unit->ctr_offset))
>> +        return;
>> +
>> +    if (parsed) {
>> +        type = search_uncore_discovery_type(unit->box_type);
>> +        if (WARN_ON_ONCE(!type))
>> +            return;
>> +        /* Store the first box of each die */
>> +        if (!type->box_ctrl_die[die])
>> +            type->box_ctrl_die[die] = unit->ctl;
>> +        return;
>> +    }
>> +
>> +    type = get_uncore_discovery_type(unit);
>> +    if (!type)
>> +        return;
>> +
>> +    box_offset = kcalloc(type->num_boxes + 1, sizeof(unsigned int),
>> GFP_KERNEL);
>> +    if (!box_offset)
>> +        return;
>> +
>> +    ids = kcalloc(type->num_boxes + 1, sizeof(unsigned int),
>> GFP_KERNEL);
>> +    if (!ids)
>> +        goto free_box_offset;
>> +
>> +    /* Store generic information for the first box */
>> +    if (!type->num_boxes) {
>> +        type->box_ctrl = unit->ctl;
>> +        type->box_ctrl_die[die] = unit->ctl;
>> +        type->num_counters = unit->num_regs;
>> +        type->counter_width = unit->bit_width;
>> +        type->ctl_offset = unit->ctl_offset;
>> +        type->ctr_offset = unit->ctr_offset;
>> +        *ids = unit->box_id;
>> +        goto end;
>> +    }
>> +
>> +    for (i = 0; i < type->num_boxes; i++) {
>> +        ids[i] = type->ids[i];
>> +        box_offset[i] = type->box_offset[i];
>> +
>> +        if (WARN_ON_ONCE(unit->box_id == ids[i]))
>> +            goto free_ids;
>> +    }
>> +    ids[i] = unit->box_id;
>> +    box_offset[i] = unit->ctl - type->box_ctrl;
>> +    kfree(type->ids);
>> +    kfree(type->box_offset);
>> +end:
>> +    type->ids = ids;
>> +    type->box_offset = box_offset;
>> +    type->num_boxes++;
>> +    return;
>> +
>> +free_ids:
>> +    kfree(ids);
>> +
>> +free_box_offset:
>> +    kfree(box_offset);
>> +
>> +}
>> +
>> +static int parse_discovery_table(struct pci_dev *dev, int die,
>> +                 u32 bar_offset, bool *parsed)
>> +{
>> +    struct uncore_global_discovery global;
>> +    struct uncore_unit_discovery unit;
>> +    void __iomem *io_addr;
>> +    resource_size_t addr;
>> +    unsigned long size;
>> +    u32 val;
>> +    int i;
>> +
>> +    pci_read_config_dword(dev, bar_offset, &val);
>> +
>> +    if (val & UNCORE_DISCOVERY_MASK)
>> +        return -EINVAL;
>> +
>> +    addr = (resource_size_t)(val & ~UNCORE_DISCOVERY_MASK);
>> +    size = UNCORE_DISCOVERY_GLOBAL_MAP_SIZE;
>> +    io_addr = ioremap(addr, size);
>> +    if (!io_addr)
>> +        return -ENOMEM;
>> +
>> +    /* Read Global Discovery State */
>> +    memcpy_fromio(&global, io_addr, sizeof(struct
>> uncore_global_discovery));
>> +    if (uncore_discovery_invalid_unit(global)) {
>> +        pr_info("Invalid Global Discovery State: 0x%llx 0x%llx
>> 0x%llx\n",
>> +            global.table1, global.ctl, global.table3);
>> +        iounmap(io_addr);
>> +        return -EINVAL;
>> +    }
>> +    iounmap(io_addr);
>> +
>> +    size = (1 + global.max_units) * global.stride * 8;
>> +    io_addr = ioremap(addr, size);
>> +    if (!io_addr)
>> +        return -ENOMEM;
>> +
>> +    /* Parsing Unit Discovery State */
>> +    for (i = 0; i < global.max_units; i++) {
>> +        memcpy_fromio(&unit, io_addr + (i + 1) * (global.stride * 8),
>> +                  sizeof(struct uncore_unit_discovery));
>> +
>> +        if (uncore_discovery_invalid_unit(unit))
>> +            continue;
>> +
>> +        if (unit.access_type >= UNCORE_ACCESS_MAX)
>> +            continue;
>> +
>> +        uncore_insert_box_info(&unit, die, *parsed);
>> +    }
>> +
>> +    *parsed = true;
>> +    iounmap(io_addr);
>> +    return 0;
>> +}
>> +
>> +bool intel_uncore_has_discovery_tables(void)
>> +{
>> +    u32 device, val, entry_id, bar_offset;
>> +    int die, dvsec = 0, ret = true;
>> +    struct pci_dev *dev = NULL;
>> +    bool parsed = false;
>> +
>> +    if (has_generic_discovery_table())
>> +        device = UNCORE_DISCOVERY_TABLE_DEVICE;
>> +    else
>> +        device = PCI_ANY_ID;
>> +
>> +    /*
>> +     * Start a new search and iterates through the list of
>> +     * the discovery table devices.
>> +     */
>> +    while ((dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, dev))
>> != NULL) {
>> +        while ((dvsec = pci_find_next_ext_capability(dev, dvsec,
>> UNCORE_EXT_CAP_ID_DISCOVERY))) {
>> +            pci_read_config_dword(dev, dvsec +
>> UNCORE_DISCOVERY_DVSEC_OFFSET, &val);
>> +            entry_id = val & UNCORE_DISCOVERY_DVSEC_ID_MASK;
>> +            if (entry_id != UNCORE_DISCOVERY_DVSEC_ID_PMON)
>> +                continue;
>> +
>> +            pci_read_config_dword(dev, dvsec +
>> UNCORE_DISCOVERY_DVSEC2_OFFSET, &val);
>> +
>> +            if (val & ~UNCORE_DISCOVERY_DVSEC2_BIR_MASK) {
>> +                ret = false;
>> +                goto err;
>> +            }
>> +            bar_offset = UNCORE_DISCOVERY_BIR_BASE +
>> +                     (val & UNCORE_DISCOVERY_DVSEC2_BIR_MASK) *
>> UNCORE_DISCOVERY_BIR_STEP;
>> +
>> +            die = get_device_die_id(dev);
>> +            if (die < 0)
>> +                continue;
>> +
>> +            parse_discovery_table(dev, die, bar_offset, &parsed);
>> +        }
>> +    }
>> +
>> +    /* None of the discovery tables are available */
>> +    if (!parsed)
>> +        ret = false;
>> +err:
>> +    pci_dev_put(dev);
>> +
>> +    return ret;
>> +}
>> +
>> +void intel_uncore_clear_discovery_tables(void)
>> +{
>> +    struct intel_uncore_discovery_type *type, *next;
>> +
>> +    rbtree_postorder_for_each_entry_safe(type, next,
>> &discovery_tables, node) {
>> +        kfree(type->box_ctrl_die);
>> +        kfree(type);
>> +    }
>> +}
>> diff --git a/arch/x86/events/intel/uncore_discovery.h
>> b/arch/x86/events/intel/uncore_discovery.h
>> new file mode 100644
>> index 0000000..95afa39
>> --- /dev/null
>> +++ b/arch/x86/events/intel/uncore_discovery.h
>> @@ -0,0 +1,105 @@
>> +/* SPDX-License-Identifier: GPL-2.0-only */
>> +
>> +/* Generic device ID of a discovery table device */
>> +#define UNCORE_DISCOVERY_TABLE_DEVICE        0x09a7
>> +/* Capability ID for a discovery table device */
>> +#define UNCORE_EXT_CAP_ID_DISCOVERY        0x23
>> +/* First DVSEC offset */
>> +#define UNCORE_DISCOVERY_DVSEC_OFFSET        0x8
>> +/* Mask of the supported discovery entry type */
>> +#define UNCORE_DISCOVERY_DVSEC_ID_MASK        0xffff
>> +/* PMON discovery entry type ID */
>> +#define UNCORE_DISCOVERY_DVSEC_ID_PMON        0x1
>> +/* Second DVSEC offset */
>> +#define UNCORE_DISCOVERY_DVSEC2_OFFSET        0xc
>> +/* Mask of the discovery table BAR offset */
>> +#define UNCORE_DISCOVERY_DVSEC2_BIR_MASK    0x7
>> +/* Discovery table BAR base offset */
>> +#define UNCORE_DISCOVERY_BIR_BASE        0x10
>> +/* Discovery table BAR step */
>> +#define UNCORE_DISCOVERY_BIR_STEP        0x4
>> +/* Mask of the discovery table offset */
>> +#define UNCORE_DISCOVERY_MASK            0xf
>> +/* Global discovery table size */
>> +#define UNCORE_DISCOVERY_GLOBAL_MAP_SIZE    0x20
>> +
>> +#define uncore_discovery_invalid_unit(unit)            \
>> +    (!unit.table1 || !unit.ctl || !unit.table3 ||    \
>> +     unit.table1 == -1ULL || unit.ctl == -1ULL ||    \
>> +     unit.table3 == -1ULL)
>> +
>> +enum uncore_access_type {
>> +    UNCORE_ACCESS_MSR    = 0,
>> +    UNCORE_ACCESS_MMIO,
>> +    UNCORE_ACCESS_PCI,
>> +
>> +    UNCORE_ACCESS_MAX,
>> +};
>> +
>> +struct uncore_global_discovery {
>> +    union {
>> +        u64    table1;
>> +        struct {
>> +            u64    type : 8,
>> +                stride : 8,
>> +                max_units : 10,
>> +                __reserved_1 : 36,
>> +                access_type : 2;
>> +        };
>> +    };
>> +
>> +    u64    ctl;        /* Global Control Address */
>> +
>> +    union {
>> +        u64    table3;
>> +        struct {
>> +            u64    status_offset : 8,
>> +                num_status : 16,
>> +                __reserved_2 : 40;
>> +        };
>> +    };
>> +};
>> +
>> +struct uncore_unit_discovery {
>> +    union {
>> +        u64    table1;
>> +        struct {
>> +            u64    num_regs : 8,
>> +                ctl_offset : 8,
>> +                bit_width : 8,
>> +                ctr_offset : 8,
>> +                status_offset : 8,
>> +                __reserved_1 : 22,
>> +                access_type : 2;
>> +            };
>> +        };
>> +
>> +    u64    ctl;        /* Unit Control Address */
>> +
>> +    union {
>> +        u64    table3;
>> +        struct {
>> +            u64    box_type : 16,
>> +                box_id : 16,
>> +                __reserved_2 : 32;
>> +        };
>> +    };
>> +};
>> +
>> +struct intel_uncore_discovery_type {
>> +    struct rb_node    node;
>> +    enum uncore_access_type    access_type;
>> +    u64        box_ctrl;    /* Unit ctrl addr of the first box */
>> +    u64        *box_ctrl_die;    /* Unit ctrl addr of the first box
>> of each die */
>> +    u16        type;        /* Type ID of the uncore block */
>> +    u8        num_counters;
>> +    u8        counter_width;
>> +    u8        ctl_offset;    /* Counter Control 0 offset */
>> +    u8        ctr_offset;    /* Counter 0 offset */
>> +    u16        num_boxes;    /* number of boxes for the uncore block */
>> +    unsigned int    *ids;        /* Box IDs */
>> +    unsigned int    *box_offset;    /* Box offset */
>> +};
>> +
>> +bool intel_uncore_has_discovery_tables(void);
>> +void intel_uncore_clear_discovery_tables(void);
>> -- 
>> 2.7.4
>>

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2022-07-22 13:04     ` Liang, Kan
@ 2022-07-23 18:56       ` Lucas De Marchi
  2022-07-25 14:51         ` Liang, Kan
  0 siblings, 1 reply; 22+ messages in thread
From: Lucas De Marchi @ 2022-07-23 18:56 UTC (permalink / raw)
  To: Liang, Kan
  Cc: peterz, mingo, acme, linux-kernel, alexander.shishkin, jolsa,
	eranian, namhyung, ak, tilak.tangudu

On Fri, Jul 22, 2022 at 09:04:43AM -0400, Liang, Kan wrote:
>
>
>On 2022-07-22 8:55 a.m., Lucas De Marchi wrote:
>> Hi Kan,
>>
>> On Wed, Mar 17, 2021 at 10:59:33AM -0700, kan.liang@linux.intel.com wrote:
>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> A self-describing mechanism for the uncore PerfMon hardware has been
>>> introduced with the latest Intel platforms. By reading through an MMIO
>>> page worth of information, perf can 'discover' all the standard uncore
>>> PerfMon registers in a machine.
>>>
>>> The discovery mechanism relies on BIOS's support. With a proper BIOS,
>>> a PCI device with the unique capability ID 0x23 can be found on each
>>> die. Perf can retrieve the information of all available uncore PerfMons
>>> from the device via MMIO. The information is composed of one global
>>> discovery table and several unit discovery tables.
>>> - The global discovery table includes global uncore information of the
>>>  die, e.g., the address of the global control register, the offset of
>>>  the global status register, the number of uncore units, the offset of
>>>  unit discovery tables, etc.
>>> - The unit discovery table includes generic uncore unit information,
>>>  e.g., the access type, the counter width, the address of counters,
>>>  the address of the counter control, the unit ID, the unit type, etc.
>>>  The unit is also called "box" in the code.
>>> Perf can provide basic uncore support based on this information
>>> with the following patches.
>>>
>>> To locate the PCI device with the discovery tables, check the generic
>>> PCI ID first. If it doesn't match, go through the entire PCI device tree
>>> and locate the device with the unique capability ID.
>>>
>>> The uncore information is similar among dies. To save parsing time and
>>> space, only completely parse and store the discovery tables on the first
>>> die and the first box of each die. The parsed information is stored in
>>> an
>>> RB tree structure, intel_uncore_discovery_type. The size of the stored
>>> discovery tables varies among platforms. It's around 4KB for a Sapphire
>>> Rapids server.
>>>
>>> If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
>>> will exit with -ENODEV. There is nothing changed.
>>>
>>> Add a module parameter to disable the discovery feature. If a BIOS gets
>>> the discovery tables wrong, users can have an option to disable the
>>> feature. For the current patchset, the uncore driver will exit with
>>> -ENODEV. In the future, it may fall back to the hardcode uncore driver
>>> on a known platform.
>>>
>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>
>> I observed one issue when upgrading a kernel from 5.10 to 5.15 and after
>> bisecting it arrived to this commit. I also verified the same issue is
>> present in 5.19-rc7 and that the issue is gone when booting with
>> intel_uncore.uncore_no_discover.
>>
>> Test system is a SPR host with a PVC gpu. Issue is that PVC is not
>> reaching pkg c6 state, even if we put it in rc6 state. It seems the pcie
>> link is not idling, preventing it to go to pkg c6.
>>
>> PMON discovery in bios is set to "auto".
>>
>> We do see the following on dmesg while going through this code path:
>>
>>     intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff
>> 0xffffffffffffffff 0xffffffffffffffff
>
>On SPR, the uncore driver relies on the discovery table provided by the
>BIOS/firmware. It looks like your BIOS/firmware is out of date. Could
>you please update to the latest BIOS/firmware and have a try?

hum, the BIOS is up to date. It seems PVC itself has a 0x09a7 device
and it remains in D3, so the 0xffffffffffffffff we se below is
just the auto completion. No wonder the values don't match what we are
expecting here.

Is it expected the device to be in D0? Or should we do anything here to
move it to D0 before doing these reads?

thanks
Lucas De Marchi

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2022-07-23 18:56       ` Lucas De Marchi
@ 2022-07-25 14:51         ` Liang, Kan
  2022-08-02 14:22           ` Lucas De Marchi
  0 siblings, 1 reply; 22+ messages in thread
From: Liang, Kan @ 2022-07-25 14:51 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: peterz, mingo, acme, linux-kernel, alexander.shishkin, jolsa,
	eranian, namhyung, ak, tilak.tangudu



On 2022-07-23 2:56 p.m., Lucas De Marchi wrote:
> On Fri, Jul 22, 2022 at 09:04:43AM -0400, Liang, Kan wrote:
>>
>>
>> On 2022-07-22 8:55 a.m., Lucas De Marchi wrote:
>>> Hi Kan,
>>>
>>> On Wed, Mar 17, 2021 at 10:59:33AM -0700, kan.liang@linux.intel.com
>>> wrote:
>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>
>>>> A self-describing mechanism for the uncore PerfMon hardware has been
>>>> introduced with the latest Intel platforms. By reading through an MMIO
>>>> page worth of information, perf can 'discover' all the standard uncore
>>>> PerfMon registers in a machine.
>>>>
>>>> The discovery mechanism relies on BIOS's support. With a proper BIOS,
>>>> a PCI device with the unique capability ID 0x23 can be found on each
>>>> die. Perf can retrieve the information of all available uncore PerfMons
>>>> from the device via MMIO. The information is composed of one global
>>>> discovery table and several unit discovery tables.
>>>> - The global discovery table includes global uncore information of the
>>>>  die, e.g., the address of the global control register, the offset of
>>>>  the global status register, the number of uncore units, the offset of
>>>>  unit discovery tables, etc.
>>>> - The unit discovery table includes generic uncore unit information,
>>>>  e.g., the access type, the counter width, the address of counters,
>>>>  the address of the counter control, the unit ID, the unit type, etc.
>>>>  The unit is also called "box" in the code.
>>>> Perf can provide basic uncore support based on this information
>>>> with the following patches.
>>>>
>>>> To locate the PCI device with the discovery tables, check the generic
>>>> PCI ID first. If it doesn't match, go through the entire PCI device
>>>> tree
>>>> and locate the device with the unique capability ID.
>>>>
>>>> The uncore information is similar among dies. To save parsing time and
>>>> space, only completely parse and store the discovery tables on the
>>>> first
>>>> die and the first box of each die. The parsed information is stored in
>>>> an
>>>> RB tree structure, intel_uncore_discovery_type. The size of the stored
>>>> discovery tables varies among platforms. It's around 4KB for a Sapphire
>>>> Rapids server.
>>>>
>>>> If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
>>>> will exit with -ENODEV. There is nothing changed.
>>>>
>>>> Add a module parameter to disable the discovery feature. If a BIOS gets
>>>> the discovery tables wrong, users can have an option to disable the
>>>> feature. For the current patchset, the uncore driver will exit with
>>>> -ENODEV. In the future, it may fall back to the hardcode uncore driver
>>>> on a known platform.
>>>>
>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>
>>> I observed one issue when upgrading a kernel from 5.10 to 5.15 and after
>>> bisecting it arrived to this commit. I also verified the same issue is
>>> present in 5.19-rc7 and that the issue is gone when booting with
>>> intel_uncore.uncore_no_discover.
>>>
>>> Test system is a SPR host with a PVC gpu. Issue is that PVC is not
>>> reaching pkg c6 state, even if we put it in rc6 state. It seems the pcie
>>> link is not idling, preventing it to go to pkg c6.
>>>
>>> PMON discovery in bios is set to "auto".
>>>
>>> We do see the following on dmesg while going through this code path:
>>>
>>>     intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff
>>> 0xffffffffffffffff 0xffffffffffffffff
>>
>> On SPR, the uncore driver relies on the discovery table provided by the
>> BIOS/firmware. It looks like your BIOS/firmware is out of date. Could
>> you please update to the latest BIOS/firmware and have a try?
> 
> hum, the BIOS is up to date. It seems PVC itself has a 0x09a7 device
> and it remains in D3, so the 0xffffffffffffffff we se below is
> just the auto completion. No wonder the values don't match what we are
> expecting here.
> 
> Is it expected the device to be in D0? Or should we do anything here to
> move it to D0 before doing these reads?
> 

It's OK to have a 0x09a7 device. But the device should not claim to
support the PMON Discovery if it doesn't comply the PMON discovery
mechanism.

See 1.10.1 Guidance on Finding PMON Discovery and Reading it in SPR
uncore document. https://cdrdv2.intel.com/v1/dl/getContent/642245
It demonstrates how the uncore driver find the device with the PMON
discovery mechanism.

Simply speaking, the uncore driver looks for a DVSEC
structure with an unique capability ID 0x23. Then it checks whether the
PMON discovery entry (0x1) is supported. If both are detected, it means
that the device comply the PMON discovery mechanism. The uncore driver
will be enabled to parse the discovery table.

AFAIK, the PVC gpu doesn't support the PMON discovery mechanism. I guess
the firmwire of the PVC gpu mistakenly set the PMON discovery entry
(0x1). You may want to check the extended capabilities (DVSEC) in the
PCIe configuration space of the PVC gpu device.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2022-07-25 14:51         ` Liang, Kan
@ 2022-08-02 14:22           ` Lucas De Marchi
  2022-08-02 15:43             ` Liang, Kan
  0 siblings, 1 reply; 22+ messages in thread
From: Lucas De Marchi @ 2022-08-02 14:22 UTC (permalink / raw)
  To: Liang, Kan
  Cc: peterz, mingo, acme, linux-kernel, alexander.shishkin, jolsa,
	eranian, namhyung, ak, tilak.tangudu

On Mon, Jul 25, 2022 at 10:51:44AM -0400, Liang, Kan wrote:
>
>
>On 2022-07-23 2:56 p.m., Lucas De Marchi wrote:
>> On Fri, Jul 22, 2022 at 09:04:43AM -0400, Liang, Kan wrote:
>>>
>>>
>>> On 2022-07-22 8:55 a.m., Lucas De Marchi wrote:
>>>> Hi Kan,
>>>>
>>>> On Wed, Mar 17, 2021 at 10:59:33AM -0700, kan.liang@linux.intel.com
>>>> wrote:
>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>
>>>>> A self-describing mechanism for the uncore PerfMon hardware has been
>>>>> introduced with the latest Intel platforms. By reading through an MMIO
>>>>> page worth of information, perf can 'discover' all the standard uncore
>>>>> PerfMon registers in a machine.
>>>>>
>>>>> The discovery mechanism relies on BIOS's support. With a proper BIOS,
>>>>> a PCI device with the unique capability ID 0x23 can be found on each
>>>>> die. Perf can retrieve the information of all available uncore PerfMons
>>>>> from the device via MMIO. The information is composed of one global
>>>>> discovery table and several unit discovery tables.
>>>>> - The global discovery table includes global uncore information of the
>>>>>  die, e.g., the address of the global control register, the offset of
>>>>>  the global status register, the number of uncore units, the offset of
>>>>>  unit discovery tables, etc.
>>>>> - The unit discovery table includes generic uncore unit information,
>>>>>  e.g., the access type, the counter width, the address of counters,
>>>>>  the address of the counter control, the unit ID, the unit type, etc.
>>>>>  The unit is also called "box" in the code.
>>>>> Perf can provide basic uncore support based on this information
>>>>> with the following patches.
>>>>>
>>>>> To locate the PCI device with the discovery tables, check the generic
>>>>> PCI ID first. If it doesn't match, go through the entire PCI device
>>>>> tree
>>>>> and locate the device with the unique capability ID.
>>>>>
>>>>> The uncore information is similar among dies. To save parsing time and
>>>>> space, only completely parse and store the discovery tables on the
>>>>> first
>>>>> die and the first box of each die. The parsed information is stored in
>>>>> an
>>>>> RB tree structure, intel_uncore_discovery_type. The size of the stored
>>>>> discovery tables varies among platforms. It's around 4KB for a Sapphire
>>>>> Rapids server.
>>>>>
>>>>> If a BIOS doesn't support the 'discovery' mechanism, the uncore driver
>>>>> will exit with -ENODEV. There is nothing changed.
>>>>>
>>>>> Add a module parameter to disable the discovery feature. If a BIOS gets
>>>>> the discovery tables wrong, users can have an option to disable the
>>>>> feature. For the current patchset, the uncore driver will exit with
>>>>> -ENODEV. In the future, it may fall back to the hardcode uncore driver
>>>>> on a known platform.
>>>>>
>>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>>
>>>> I observed one issue when upgrading a kernel from 5.10 to 5.15 and after
>>>> bisecting it arrived to this commit. I also verified the same issue is
>>>> present in 5.19-rc7 and that the issue is gone when booting with
>>>> intel_uncore.uncore_no_discover.
>>>>
>>>> Test system is a SPR host with a PVC gpu. Issue is that PVC is not
>>>> reaching pkg c6 state, even if we put it in rc6 state. It seems the pcie
>>>> link is not idling, preventing it to go to pkg c6.
>>>>
>>>> PMON discovery in bios is set to "auto".
>>>>
>>>> We do see the following on dmesg while going through this code path:
>>>>
>>>>     intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff
>>>> 0xffffffffffffffff 0xffffffffffffffff
>>>
>>> On SPR, the uncore driver relies on the discovery table provided by the
>>> BIOS/firmware. It looks like your BIOS/firmware is out of date. Could
>>> you please update to the latest BIOS/firmware and have a try?
>>
>> hum, the BIOS is up to date. It seems PVC itself has a 0x09a7 device
>> and it remains in D3, so the 0xffffffffffffffff we se below is
>> just the auto completion. No wonder the values don't match what we are
>> expecting here.
>>
>> Is it expected the device to be in D0? Or should we do anything here to
>> move it to D0 before doing these reads?
>>
>
>It's OK to have a 0x09a7 device. But the device should not claim to
>support the PMON Discovery if it doesn't comply the PMON discovery
>mechanism.
>
>See 1.10.1 Guidance on Finding PMON Discovery and Reading it in SPR
>uncore document. https://cdrdv2.intel.com/v1/dl/getContent/642245
>It demonstrates how the uncore driver find the device with the PMON
>discovery mechanism.

ok, this is exactly the code in the kernel.

>
>Simply speaking, the uncore driver looks for a DVSEC
>structure with an unique capability ID 0x23. Then it checks whether the
>PMON discovery entry (0x1) is supported. If both are detected, it means
>that the device comply the PMON discovery mechanism. The uncore driver
>will be enabled to parse the discovery table.
>
>AFAIK, the PVC gpu doesn't support the PMON discovery mechanism. I guess
>the firmwire of the PVC gpu mistakenly set the PMON discovery entry
>(0x1). You may want to check the extended capabilities (DVSEC) in the
>PCIe configuration space of the PVC gpu device.

However here it seems we have 2 issues being mixed:

1) PVC with that capability when it shouldn't
2) Trying to read the MMIOs when device is possibly in D3 state:

	/* Map whole discovery table */
	addr = pci_dword & ~(PAGE_SIZE - 1);
	io_addr = ioremap(addr, UNCORE_DISCOVERY_MAP_SIZE);

	/* Read Global Discovery table */
	memcpy_fromio(&global, io_addr, sizeof(struct uncore_global_discovery));

Unless it's guaranteed that at this point the device must be in D0
state, this doesn't look right.  When we are binding a driver to a PCI
device, pci core will move it to D0 for us:

	static long local_pci_probe(void *_ddi)
	{
		...
		/*
		 * Unbound PCI devices are always put in D0, regardless of
		 * runtime PM status.  During probe, the device is set to
		 * active and the usage count is incremented.  If the driver
		 * supports runtime PM, it should call pm_runtime_put_noidle(),
		 * or any other runtime PM helper function decrementing the usage
		 * count, in its probe routine and pm_runtime_get_noresume() in
		 * its remove routine.
		 */
		 pm_runtime_get_sync(dev);
		 ...

But here we are traversing the entire PCI device tree by ourselves.
Considering intel_uncore is a module that can be loaded at any time
(even after the driver supporting PVC, which already called
pm_runtime_put_noidle(), it looks like we are missing the pm integration
here.

On a quick hack, just forcing the device into D0 before doing the MMIO,
the PM issue is gone (but we still hit the problem of PVC having the cap
when it shouldn't)

thanks
Lucas De Marchi

>
>Thanks,
>Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2022-08-02 14:22           ` Lucas De Marchi
@ 2022-08-02 15:43             ` Liang, Kan
  2022-08-02 16:02               ` Lucas De Marchi
  0 siblings, 1 reply; 22+ messages in thread
From: Liang, Kan @ 2022-08-02 15:43 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: peterz, mingo, acme, linux-kernel, alexander.shishkin, jolsa,
	eranian, namhyung, ak, tilak.tangudu



On 2022-08-02 10:22 a.m., Lucas De Marchi wrote:
> On Mon, Jul 25, 2022 at 10:51:44AM -0400, Liang, Kan wrote:
>>
>>
>> On 2022-07-23 2:56 p.m., Lucas De Marchi wrote:
>>> On Fri, Jul 22, 2022 at 09:04:43AM -0400, Liang, Kan wrote:
>>>>
>>>>
>>>> On 2022-07-22 8:55 a.m., Lucas De Marchi wrote:
>>>>> Hi Kan,
>>>>>
>>>>> On Wed, Mar 17, 2021 at 10:59:33AM -0700, kan.liang@linux.intel.com
>>>>> wrote:
>>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>>
>>>>>> A self-describing mechanism for the uncore PerfMon hardware has been
>>>>>> introduced with the latest Intel platforms. By reading through an
>>>>>> MMIO
>>>>>> page worth of information, perf can 'discover' all the standard
>>>>>> uncore
>>>>>> PerfMon registers in a machine.
>>>>>>
>>>>>> The discovery mechanism relies on BIOS's support. With a proper BIOS,
>>>>>> a PCI device with the unique capability ID 0x23 can be found on each
>>>>>> die. Perf can retrieve the information of all available uncore
>>>>>> PerfMons
>>>>>> from the device via MMIO. The information is composed of one global
>>>>>> discovery table and several unit discovery tables.
>>>>>> - The global discovery table includes global uncore information of
>>>>>> the
>>>>>>  die, e.g., the address of the global control register, the offset of
>>>>>>  the global status register, the number of uncore units, the
>>>>>> offset of
>>>>>>  unit discovery tables, etc.
>>>>>> - The unit discovery table includes generic uncore unit information,
>>>>>>  e.g., the access type, the counter width, the address of counters,
>>>>>>  the address of the counter control, the unit ID, the unit type, etc.
>>>>>>  The unit is also called "box" in the code.
>>>>>> Perf can provide basic uncore support based on this information
>>>>>> with the following patches.
>>>>>>
>>>>>> To locate the PCI device with the discovery tables, check the generic
>>>>>> PCI ID first. If it doesn't match, go through the entire PCI device
>>>>>> tree
>>>>>> and locate the device with the unique capability ID.
>>>>>>
>>>>>> The uncore information is similar among dies. To save parsing time
>>>>>> and
>>>>>> space, only completely parse and store the discovery tables on the
>>>>>> first
>>>>>> die and the first box of each die. The parsed information is
>>>>>> stored in
>>>>>> an
>>>>>> RB tree structure, intel_uncore_discovery_type. The size of the
>>>>>> stored
>>>>>> discovery tables varies among platforms. It's around 4KB for a
>>>>>> Sapphire
>>>>>> Rapids server.
>>>>>>
>>>>>> If a BIOS doesn't support the 'discovery' mechanism, the uncore
>>>>>> driver
>>>>>> will exit with -ENODEV. There is nothing changed.
>>>>>>
>>>>>> Add a module parameter to disable the discovery feature. If a BIOS
>>>>>> gets
>>>>>> the discovery tables wrong, users can have an option to disable the
>>>>>> feature. For the current patchset, the uncore driver will exit with
>>>>>> -ENODEV. In the future, it may fall back to the hardcode uncore
>>>>>> driver
>>>>>> on a known platform.
>>>>>>
>>>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>>>
>>>>> I observed one issue when upgrading a kernel from 5.10 to 5.15 and
>>>>> after
>>>>> bisecting it arrived to this commit. I also verified the same issue is
>>>>> present in 5.19-rc7 and that the issue is gone when booting with
>>>>> intel_uncore.uncore_no_discover.
>>>>>
>>>>> Test system is a SPR host with a PVC gpu. Issue is that PVC is not
>>>>> reaching pkg c6 state, even if we put it in rc6 state. It seems the
>>>>> pcie
>>>>> link is not idling, preventing it to go to pkg c6.
>>>>>
>>>>> PMON discovery in bios is set to "auto".
>>>>>
>>>>> We do see the following on dmesg while going through this code path:
>>>>>
>>>>>     intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff
>>>>> 0xffffffffffffffff 0xffffffffffffffff
>>>>
>>>> On SPR, the uncore driver relies on the discovery table provided by the
>>>> BIOS/firmware. It looks like your BIOS/firmware is out of date. Could
>>>> you please update to the latest BIOS/firmware and have a try?
>>>
>>> hum, the BIOS is up to date. It seems PVC itself has a 0x09a7 device
>>> and it remains in D3, so the 0xffffffffffffffff we se below is
>>> just the auto completion. No wonder the values don't match what we are
>>> expecting here.
>>>
>>> Is it expected the device to be in D0? Or should we do anything here to
>>> move it to D0 before doing these reads?
>>>
>>
>> It's OK to have a 0x09a7 device. But the device should not claim to
>> support the PMON Discovery if it doesn't comply the PMON discovery
>> mechanism.
>>
>> See 1.10.1 Guidance on Finding PMON Discovery and Reading it in SPR
>> uncore document. https://cdrdv2.intel.com/v1/dl/getContent/642245
>> It demonstrates how the uncore driver find the device with the PMON
>> discovery mechanism.
> 
> ok, this is exactly the code in the kernel.
> 
>>
>> Simply speaking, the uncore driver looks for a DVSEC
>> structure with an unique capability ID 0x23. Then it checks whether the
>> PMON discovery entry (0x1) is supported. If both are detected, it means
>> that the device comply the PMON discovery mechanism. The uncore driver
>> will be enabled to parse the discovery table.
>>
>> AFAIK, the PVC gpu doesn't support the PMON discovery mechanism. I guess
>> the firmwire of the PVC gpu mistakenly set the PMON discovery entry
>> (0x1). You may want to check the extended capabilities (DVSEC) in the
>> PCIe configuration space of the PVC gpu device.
> 
> However here it seems we have 2 issues being mixed:
> 
> 1) PVC with that capability when it shouldn't

This is a firmware/HW issue. If PVC doesn't support the PMON discovery
mechanism, the PVC and its attached OOBMSM device should not enumerate
the discovery mechanism. However, the PVC enumerates the discovery
mechanism here, which doesn't comply the spec.

The uncore driver prints errors when the in-compliance is detected.
That's expected. There is noting more SW can do here.

The firmware issue must be fixed.

> 2) Trying to read the MMIOs when device is possibly in D3 state:

The uncore driver skips the device which doesn't support the discovery
mechanism.
If 1) is fixed, the uncore driver will not touch the MMIO space of a PVC
device. The power issue should be gone.

I've already sent you a patch to ignore the PVC added OOBMSM device, you
can double check with the patch.

Thanks,
Kan

> 
>     /* Map whole discovery table */
>     addr = pci_dword & ~(PAGE_SIZE - 1);
>     io_addr = ioremap(addr, UNCORE_DISCOVERY_MAP_SIZE);
> 
>     /* Read Global Discovery table */
>     memcpy_fromio(&global, io_addr, sizeof(struct
> uncore_global_discovery));
> 
> Unless it's guaranteed that at this point the device must be in D0
> state, this doesn't look right.  When we are binding a driver to a PCI
> device, pci core will move it to D0 for us:
> 
>     static long local_pci_probe(void *_ddi)
>     {
>         ...
>         /*
>          * Unbound PCI devices are always put in D0, regardless of
>          * runtime PM status.  During probe, the device is set to
>          * active and the usage count is incremented.  If the driver
>          * supports runtime PM, it should call pm_runtime_put_noidle(),
>          * or any other runtime PM helper function decrementing the usage
>          * count, in its probe routine and pm_runtime_get_noresume() in
>          * its remove routine.
>          */
>          pm_runtime_get_sync(dev);
>          ...
> 
> But here we are traversing the entire PCI device tree by ourselves.
> Considering intel_uncore is a module that can be loaded at any time
> (even after the driver supporting PVC, which already called
> pm_runtime_put_noidle(), it looks like we are missing the pm integration
> here.
> 
> On a quick hack, just forcing the device into D0 before doing the MMIO,
> the PM issue is gone (but we still hit the problem of PVC having the cap
> when it shouldn't)
> 
> thanks
> Lucas De Marchi
> 
>>
>> Thanks,
>> Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2022-08-02 15:43             ` Liang, Kan
@ 2022-08-02 16:02               ` Lucas De Marchi
  2022-08-02 17:23                 ` Liang, Kan
  0 siblings, 1 reply; 22+ messages in thread
From: Lucas De Marchi @ 2022-08-02 16:02 UTC (permalink / raw)
  To: Liang, Kan
  Cc: peterz, mingo, acme, linux-kernel, alexander.shishkin, jolsa,
	eranian, namhyung, ak, tilak.tangudu

On Tue, Aug 02, 2022 at 11:43:36AM -0400, Liang, Kan wrote:
>
>
>On 2022-08-02 10:22 a.m., Lucas De Marchi wrote:
>> On Mon, Jul 25, 2022 at 10:51:44AM -0400, Liang, Kan wrote:
>>>
>>>
>>> On 2022-07-23 2:56 p.m., Lucas De Marchi wrote:
>>>> On Fri, Jul 22, 2022 at 09:04:43AM -0400, Liang, Kan wrote:
>>>>>
>>>>>
>>>>> On 2022-07-22 8:55 a.m., Lucas De Marchi wrote:
>>>>>> Hi Kan,
>>>>>>
>>>>>> On Wed, Mar 17, 2021 at 10:59:33AM -0700, kan.liang@linux.intel.com
>>>>>> wrote:
>>>>>>> From: Kan Liang <kan.liang@linux.intel.com>
>>>>>>>
>>>>>>> A self-describing mechanism for the uncore PerfMon hardware has been
>>>>>>> introduced with the latest Intel platforms. By reading through an
>>>>>>> MMIO
>>>>>>> page worth of information, perf can 'discover' all the standard
>>>>>>> uncore
>>>>>>> PerfMon registers in a machine.
>>>>>>>
>>>>>>> The discovery mechanism relies on BIOS's support. With a proper BIOS,
>>>>>>> a PCI device with the unique capability ID 0x23 can be found on each
>>>>>>> die. Perf can retrieve the information of all available uncore
>>>>>>> PerfMons
>>>>>>> from the device via MMIO. The information is composed of one global
>>>>>>> discovery table and several unit discovery tables.
>>>>>>> - The global discovery table includes global uncore information of
>>>>>>> the
>>>>>>>  die, e.g., the address of the global control register, the offset of
>>>>>>>  the global status register, the number of uncore units, the
>>>>>>> offset of
>>>>>>>  unit discovery tables, etc.
>>>>>>> - The unit discovery table includes generic uncore unit information,
>>>>>>>  e.g., the access type, the counter width, the address of counters,
>>>>>>>  the address of the counter control, the unit ID, the unit type, etc.
>>>>>>>  The unit is also called "box" in the code.
>>>>>>> Perf can provide basic uncore support based on this information
>>>>>>> with the following patches.
>>>>>>>
>>>>>>> To locate the PCI device with the discovery tables, check the generic
>>>>>>> PCI ID first. If it doesn't match, go through the entire PCI device
>>>>>>> tree
>>>>>>> and locate the device with the unique capability ID.
>>>>>>>
>>>>>>> The uncore information is similar among dies. To save parsing time
>>>>>>> and
>>>>>>> space, only completely parse and store the discovery tables on the
>>>>>>> first
>>>>>>> die and the first box of each die. The parsed information is
>>>>>>> stored in
>>>>>>> an
>>>>>>> RB tree structure, intel_uncore_discovery_type. The size of the
>>>>>>> stored
>>>>>>> discovery tables varies among platforms. It's around 4KB for a
>>>>>>> Sapphire
>>>>>>> Rapids server.
>>>>>>>
>>>>>>> If a BIOS doesn't support the 'discovery' mechanism, the uncore
>>>>>>> driver
>>>>>>> will exit with -ENODEV. There is nothing changed.
>>>>>>>
>>>>>>> Add a module parameter to disable the discovery feature. If a BIOS
>>>>>>> gets
>>>>>>> the discovery tables wrong, users can have an option to disable the
>>>>>>> feature. For the current patchset, the uncore driver will exit with
>>>>>>> -ENODEV. In the future, it may fall back to the hardcode uncore
>>>>>>> driver
>>>>>>> on a known platform.
>>>>>>>
>>>>>>> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
>>>>>>
>>>>>> I observed one issue when upgrading a kernel from 5.10 to 5.15 and
>>>>>> after
>>>>>> bisecting it arrived to this commit. I also verified the same issue is
>>>>>> present in 5.19-rc7 and that the issue is gone when booting with
>>>>>> intel_uncore.uncore_no_discover.
>>>>>>
>>>>>> Test system is a SPR host with a PVC gpu. Issue is that PVC is not
>>>>>> reaching pkg c6 state, even if we put it in rc6 state. It seems the
>>>>>> pcie
>>>>>> link is not idling, preventing it to go to pkg c6.
>>>>>>
>>>>>> PMON discovery in bios is set to "auto".
>>>>>>
>>>>>> We do see the following on dmesg while going through this code path:
>>>>>>
>>>>>>     intel_uncore: Invalid Global Discovery State: 0xffffffffffffffff
>>>>>> 0xffffffffffffffff 0xffffffffffffffff
>>>>>
>>>>> On SPR, the uncore driver relies on the discovery table provided by the
>>>>> BIOS/firmware. It looks like your BIOS/firmware is out of date. Could
>>>>> you please update to the latest BIOS/firmware and have a try?
>>>>
>>>> hum, the BIOS is up to date. It seems PVC itself has a 0x09a7 device
>>>> and it remains in D3, so the 0xffffffffffffffff we se below is
>>>> just the auto completion. No wonder the values don't match what we are
>>>> expecting here.
>>>>
>>>> Is it expected the device to be in D0? Or should we do anything here to
>>>> move it to D0 before doing these reads?
>>>>
>>>
>>> It's OK to have a 0x09a7 device. But the device should not claim to
>>> support the PMON Discovery if it doesn't comply the PMON discovery
>>> mechanism.
>>>
>>> See 1.10.1 Guidance on Finding PMON Discovery and Reading it in SPR
>>> uncore document. https://cdrdv2.intel.com/v1/dl/getContent/642245
>>> It demonstrates how the uncore driver find the device with the PMON
>>> discovery mechanism.
>>
>> ok, this is exactly the code in the kernel.
>>
>>>
>>> Simply speaking, the uncore driver looks for a DVSEC
>>> structure with an unique capability ID 0x23. Then it checks whether the
>>> PMON discovery entry (0x1) is supported. If both are detected, it means
>>> that the device comply the PMON discovery mechanism. The uncore driver
>>> will be enabled to parse the discovery table.
>>>
>>> AFAIK, the PVC gpu doesn't support the PMON discovery mechanism. I guess
>>> the firmwire of the PVC gpu mistakenly set the PMON discovery entry
>>> (0x1). You may want to check the extended capabilities (DVSEC) in the
>>> PCIe configuration space of the PVC gpu device.
>>
>> However here it seems we have 2 issues being mixed:
>>
>> 1) PVC with that capability when it shouldn't
>
>This is a firmware/HW issue. If PVC doesn't support the PMON discovery
>mechanism, the PVC and its attached OOBMSM device should not enumerate
>the discovery mechanism. However, the PVC enumerates the discovery
>mechanism here, which doesn't comply the spec.
>
>The uncore driver prints errors when the in-compliance is detected.
>That's expected. There is noting more SW can do here.
>
>The firmware issue must be fixed.

yes, that's what I said. It's exposing the capability when it shouldn't.
That's being worked on from the firmware side already.

>
>> 2) Trying to read the MMIOs when device is possibly in D3 state:
>
>The uncore driver skips the device which doesn't support the discovery
>mechanism.
>If 1) is fixed, the uncore driver will not touch the MMIO space of a PVC
>device. The power issue should be gone.
>
>I've already sent you a patch to ignore the PVC added OOBMSM device, you
>can double check with the patch.

(2) is a more generic issue that I'm mentioning. Forget for a moment we
are talking about PVC - that will be fixed by (1). We are trying to read
the mmio from a device that can be in D3, either because it started in
D3 or because a driver, loaded before intel_uncore, moved it to that
state. That won't work even if the device supports the discovery
mechanism.

Lucas De Marchi

>
>Thanks,
>Kan
>
>>
>>     /* Map whole discovery table */
>>     addr = pci_dword & ~(PAGE_SIZE - 1);
>>     io_addr = ioremap(addr, UNCORE_DISCOVERY_MAP_SIZE);
>>
>>     /* Read Global Discovery table */
>>     memcpy_fromio(&global, io_addr, sizeof(struct
>> uncore_global_discovery));
>>
>> Unless it's guaranteed that at this point the device must be in D0
>> state, this doesn't look right.  When we are binding a driver to a PCI
>> device, pci core will move it to D0 for us:
>>
>>     static long local_pci_probe(void *_ddi)
>>     {
>>         ...
>>         /*
>>          * Unbound PCI devices are always put in D0, regardless of
>>          * runtime PM status.  During probe, the device is set to
>>          * active and the usage count is incremented.  If the driver
>>          * supports runtime PM, it should call pm_runtime_put_noidle(),
>>          * or any other runtime PM helper function decrementing the usage
>>          * count, in its probe routine and pm_runtime_get_noresume() in
>>          * its remove routine.
>>          */
>>          pm_runtime_get_sync(dev);
>>          ...
>>
>> But here we are traversing the entire PCI device tree by ourselves.
>> Considering intel_uncore is a module that can be loaded at any time
>> (even after the driver supporting PVC, which already called
>> pm_runtime_put_noidle(), it looks like we are missing the pm integration
>> here.
>>
>> On a quick hack, just forcing the device into D0 before doing the MMIO,
>> the PM issue is gone (but we still hit the problem of PVC having the cap
>> when it shouldn't)
>>
>> thanks
>> Lucas De Marchi
>>
>>>
>>> Thanks,
>>> Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables
  2022-08-02 16:02               ` Lucas De Marchi
@ 2022-08-02 17:23                 ` Liang, Kan
  0 siblings, 0 replies; 22+ messages in thread
From: Liang, Kan @ 2022-08-02 17:23 UTC (permalink / raw)
  To: Lucas De Marchi
  Cc: peterz, mingo, acme, linux-kernel, alexander.shishkin, jolsa,
	eranian, namhyung, ak, tilak.tangudu



On 2022-08-02 12:02 p.m., Lucas De Marchi wrote:
>>> 2) Trying to read the MMIOs when device is possibly in D3 state:
>>
>> The uncore driver skips the device which doesn't support the discovery
>> mechanism.
>> If 1) is fixed, the uncore driver will not touch the MMIO space of a PVC
>> device. The power issue should be gone.
>>
>> I've already sent you a patch to ignore the PVC added OOBMSM device, you
>> can double check with the patch.
> 
> (2) is a more generic issue that I'm mentioning. Forget for a moment we
> are talking about PVC - that will be fixed by (1). We are trying to read
> the mmio from a device that can be in D3, either because it started in
> D3 or because a driver, loaded before intel_uncore, moved it to that
> state. That won't work even if the device supports the discovery
> mechanism.

The uncore driver is designed to only support the *PMON* discovery
table, not all the discovery tables. The DEVSEC_ID 1 indicates the
*PMON* discovery table.

For other devices which support the discovery mechanism, they should use
another DEVSEC_ID. The uncore driver will ignore the device.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [PATCH V2 0/5] Uncore PMON discovery mechanism support
  2021-03-17 17:59 [PATCH V2 0/5] Uncore PMON discovery mechanism support kan.liang
                   ` (4 preceding siblings ...)
  2021-03-17 17:59 ` [PATCH V2 5/5] perf/x86/intel/uncore: Generic support for the MMIO " kan.liang
@ 2022-09-20 18:25 ` Kin Cho
  5 siblings, 0 replies; 22+ messages in thread
From: Kin Cho @ 2022-09-20 18:25 UTC (permalink / raw)
  To: kan.liang, peterz, mingo, acme, linux-kernel
  Cc: alexander.shishkin, jolsa, eranian, namhyung, ak

Hi Kan,

We're seeing the warning below from uncore_insert_box_info on SPR.
I added a debug print:

     /* Parsing Unit Discovery State */
     for (i = 0; i < global.max_units; i++) {
..
         uncore_insert_box_info(&unit, die, *parsed);
 >>        pr_info("%d 0x%llx\n", i, unit.ctl);

and here's the output:

[   17.758579] intel_uncore: 0 0x2fc0
[   17.763117] intel_uncore: 2 0x2010
..
[   17.935286] intel_uncore: 65 0x87e410a0
[   17.940308] intel_uncore: 66 0x87e21318
[   17.945331] ------------[ cut here ]------------
[   17.946305] WARNING: CPU: 65 PID: 1 at 
arch/x86/events/intel/uncore_discovery.c:184 
intel_uncore_has_discovery_tables+0x4c0/0x65c
..
[   18.161512] intel_uncore: 67 0x87e410a0
[   18.166533] intel_uncore: 68 0x87e21318
..

Any suggestions?

-kin

[   17.945331] ------------[ cut here ]------------
[   17.946305] WARNING: CPU: 65 PID: 1 at 
arch/x86/events/intel/uncore_discovery.c:184 
intel_uncore_has_discovery_tables+0x4c0/0x65c
[   17.946305] Modules linked in:
[   17.946305] CPU: 65 PID: 1 Comm: swapper/0 Not tainted 
5.4.17-2136.313.1-X10-2c+ #4
[   17.946305] Hardware name: Oracle Corporation 
sca-x102c-107-sp/PCA,MB,X10-2c, BIOS 79805101 09/13/2022
[   17.946305] RIP: 0010:intel_uncore_has_discovery_tables+0x4c0/0x65c
[   17.946305] Code: 38 48 63 f0 48 8d 3c b1 45 8b 04 b0 44 89 07 4c 8b 
42 40 45 8b 04 b0 45 89 04 b1 0f b7 75 ca 3b 37 75 cf 4c 89 8d 68 ff ff 
ff <0f> 0b 48 89 cf e8 c6 4f 2b 00 4c 8b 8d 68 ff ff ff 4c 89 cf e8 b7
[   17.946305] RSP: 0000:ff4b04f60006bd08 EFLAGS: 00010246
[   17.946305] RAX: 0000000000000002 RBX: 0000000000000044 RCX: 
ff43a98a4ff1bb30
[   17.946305] RDX: ff43a98a4ff294e0 RSI: 0000000000000003 RDI: 
ff43a98a4ff1bb38
[   17.946305] RBP: ff4b04f60006bdb0 R08: 0000000000018000 R09: 
ff43a98a4ff1b310
[   17.946305] R10: 0000000000000005 R11: ff43a98c7f7fe000 R12: 
ff43a98777d66000
[   17.946305] R13: 0000000000015240 R14: ff4b04f61b286000 R15: 
0000000000000043
[   17.946305] FS:  0000000000000000(0000) GS:ff43a90a7f840000(0000) 
knlGS:0000000000000000
[   17.946305] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   17.946305] CR2: 0000000000000000 CR3: 000000967e00a001 CR4: 
0000000000761ee0
[   17.946305] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 
0000000000000000
[   17.946305] DR3: 0000000000000000 DR6: 00000000fffe07f0 DR7: 
0000000000000400
[   17.946305] PKRU: 55555554
[   17.946305] Call Trace:
[   17.946305]  ? uncore_types_init+0x25f/0x25f
[   17.946305]  intel_uncore_init+0x64/0x50c
[   17.946305]  ? perf_pmu_register+0x2cc/0x403
[   17.946305]  ? uncore_types_init+0x25f/0x25f
[   17.946305]  do_one_initcall+0x52/0x1e1
[   17.946305]  ? trace_event_define_fields_initcall_level+0x2a/0x36
[   17.946305]  kernel_init_freeable+0x1fc/0x2a7
[   17.946305]  ? loglevel+0x5d/0x5d
[   17.946305]  ? rest_init+0xb0/0xb0
[   17.946305]  kernel_init+0xe/0x123
[   17.946305]  ret_from_fork+0x24/0x36
[   17.946305] ---[ end trace d9131e47b8a615f4 ]---


On 3/17/21 10:59 AM, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
>
> Changes since V1:
> - Use the generic rbtree functions, rb_add() and rb_find(). (Patch 1)
> - Add a module parameter, uncore_no_discover. If users don't want the
>    discovery feature, they can set uncore_no_discover=true. (Patch 1)
>
>
> A mechanism of self-describing HW for the uncore PMOM has been
> introduced with the latest Intel platforms. By reading through an MMIO
> page worth of information, SW can ‘discover’ all the standard uncore
> PMON registers.
>
> With the discovery mechanism, Perf can
> - Retrieve the generic uncore unit information of all standard uncore
>    blocks, e.g., the address of counters, the address of the counter
>    control, the counter width, the access type, etc.
>    Perf can provide basic uncore support based on this information.
>    For a new platform, perf users will get basic uncore support even if
>    the platform-specific enabling code is not ready yet.
> - Retrieve accurate uncore unit information, e.g., the number of uncore
>    boxes. The number of uncore boxes may be different among machines.
>    Current perf hard code the max number of the uncore blocks. On some
>    machines, perf may create a PMU for an unavailable uncore block.
>    Although there is no harm (always return 0 for the unavailable uncore
>    block), it may confuse the users. The discovery mechanism can provide
>    the accurate number of available uncore boxes on a machine.
>
> But, the discovery mechanism has some limits,
> - Rely on BIOS's support. If a BIOS doesn't support the discovery
>    mechanism, the uncore driver will exit with -ENODEV. There is nothing
>    changed.
> - Only provide the generic uncore unit information. The information for
>    the advanced features, such as fixed counters, filters, and
>    constraints, cannot be retrieved.
> - Only support the standard PMON blocks. Non-standard PMON blocks, e.g.,
>    free-running counters, are not supported.
> - Only provide an ID for an uncore block. No meaningful name is
>    provided. The uncore_type_&typeID_&boxID will be used as the name.
> - Enabling the PCI and MMIO type of uncore blocks rely on the NUMA support.
>    These uncore blocks require the mapping information from a BUS to a
>    die. The current discovery table doesn't provide the mapping
>    information. The pcibus_to_node() from NUMA is used to retrieve the
>    information. If NUMA is not supported, some uncore blocks maybe
>    unavailable.
>
> To locate the MMIO page, SW has to find a PCI device with the unique
> capability ID 0x23 and retrieve its BAR address.
>
> The spec can be found at Snow Ridge or Ice Lake server's uncore document.
> https://cdrdv2.intel.com/v1/dl/getContent/611319
>
> Kan Liang (5):
>    perf/x86/intel/uncore: Parse uncore discovery tables
>    perf/x86/intel/uncore: Generic support for the MSR type of uncore
>      blocks
>    perf/x86/intel/uncore: Rename uncore_notifier to
>      uncore_pci_sub_notifier
>    perf/x86/intel/uncore: Generic support for the PCI type of uncore
>      blocks
>    perf/x86/intel/uncore: Generic support for the MMIO type of uncore
>      blocks
>
>   arch/x86/events/intel/Makefile           |   2 +-
>   arch/x86/events/intel/uncore.c           | 188 ++++++++--
>   arch/x86/events/intel/uncore.h           |  10 +-
>   arch/x86/events/intel/uncore_discovery.c | 622 +++++++++++++++++++++++++++++++
>   arch/x86/events/intel/uncore_discovery.h | 131 +++++++
>   5 files changed, 922 insertions(+), 31 deletions(-)
>   create mode 100644 arch/x86/events/intel/uncore_discovery.c
>   create mode 100644 arch/x86/events/intel/uncore_discovery.h
>


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-09-20 18:25 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-03-17 17:59 [PATCH V2 0/5] Uncore PMON discovery mechanism support kan.liang
2021-03-17 17:59 ` [PATCH V2 1/5] perf/x86/intel/uncore: Parse uncore discovery tables kan.liang
2021-03-19  1:10   ` Namhyung Kim
2021-03-19 20:28     ` Liang, Kan
2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
2022-07-22 12:55   ` [PATCH V2 1/5] " Lucas De Marchi
2022-07-22 13:04     ` Liang, Kan
2022-07-23 18:56       ` Lucas De Marchi
2022-07-25 14:51         ` Liang, Kan
2022-08-02 14:22           ` Lucas De Marchi
2022-08-02 15:43             ` Liang, Kan
2022-08-02 16:02               ` Lucas De Marchi
2022-08-02 17:23                 ` Liang, Kan
2021-03-17 17:59 ` [PATCH V2 2/5] perf/x86/intel/uncore: Generic support for the MSR type of uncore blocks kan.liang
2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
2021-03-17 17:59 ` [PATCH V2 3/5] perf/x86/intel/uncore: Rename uncore_notifier to uncore_pci_sub_notifier kan.liang
2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
2021-03-17 17:59 ` [PATCH V2 4/5] perf/x86/intel/uncore: Generic support for the PCI type of uncore blocks kan.liang
2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
2021-03-17 17:59 ` [PATCH V2 5/5] perf/x86/intel/uncore: Generic support for the MMIO " kan.liang
2021-04-02  8:12   ` [tip: perf/core] " tip-bot2 for Kan Liang
2022-09-20 18:25 ` [PATCH V2 0/5] Uncore PMON discovery mechanism support Kin Cho

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.