linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH 0/7] perf: Add Sapphire Rapids server uncore support
@ 2021-06-24  1:22 kan.liang
  2021-06-24  1:22 ` [PATCH 1/7] driver core: Add a way to get to bus devices kset kan.liang
                   ` (6 more replies)
  0 siblings, 7 replies; 33+ messages in thread
From: kan.liang @ 2021-06-24  1:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: eranian, namhyung, acme, jolsa, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Intel Sapphire Rapids supports a discovery mechanism, that allows an
uncore driver to discover the different components ("boxes") of the
chip.

All the generic information of the uncore boxes should be retrieved from
the discovery tables. This has been enabled with the commit edae1f06c2cd
("perf/x86/intel/uncore: Parse uncore discovery tables"). The uncore
driver doesn't need to hard code the generic information for each uncore
box. But we still need to enable various functionality that cannot be
directly discovered. This is done in the patchset.

Without this platform-specific enabling patch set, perf uses a type ID
plus a box ID, e.g., uncore_type_0_0 to name an uncore PMU. With the
patch set, perf has the mapping information from a type ID to a specific
uncore unit. Just like the previous platforms, the uncore PMU can be
named by the real PMU name, e.g., uncore_cha_0. To make the name
backward-compatible, a symlink is created from the new to the old name.
So the user scripts which use the old name still work.

The uncore spec of Sapphire Rapids server can be found at
https://cdrdv2.intel.com/v1/dl/getContent/642245

Kan Liang (7):
  driver core: Add a way to get to bus devices kset
  perf: Create a symlink for a PMU
  perf/x86/intel/uncore: Create a symlink for an uncore PMU
  perf/x86/intel/uncore: Add Sapphire Rapids server support
  perf/x86/intel/uncore: Factor out snr_uncore_mmio_map()
  perf/x86/intel/uncore: Support free-running counters on Sapphire
    Rapids server
  perf/x86/intel/uncore: Fix invalid unit check

 arch/x86/events/intel/uncore.c           |  44 ++-
 arch/x86/events/intel/uncore.h           |   5 +
 arch/x86/events/intel/uncore_discovery.c |  42 +--
 arch/x86/events/intel/uncore_discovery.h |  23 +-
 arch/x86/events/intel/uncore_snbep.c     | 508 ++++++++++++++++++++++++++++++-
 drivers/base/bus.c                       |   6 +
 include/linux/device/bus.h               |   1 +
 include/linux/perf_event.h               |   1 +
 kernel/events/core.c                     |  19 ++
 9 files changed, 607 insertions(+), 42 deletions(-)

-- 
2.7.4


^ permalink raw reply	[flat|nested] 33+ messages in thread

* [PATCH 1/7] driver core: Add a way to get to bus devices kset
  2021-06-24  1:22 [PATCH 0/7] perf: Add Sapphire Rapids server uncore support kan.liang
@ 2021-06-24  1:22 ` kan.liang
  2021-06-24  5:41   ` Greg KH
  2021-06-24  1:22 ` [PATCH 2/7] perf: Create a symlink for a PMU kan.liang
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: kan.liang @ 2021-06-24  1:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: eranian, namhyung, acme, jolsa, ak, Kan Liang, gregkh, rafael.j.wysocki

From: Kan Liang <kan.liang@linux.intel.com>

Add an accessor function to get to the bus devices kset associated with
a struct bus_type.

The user of this is the following perf changes, which will need to get
to the kobj of the 'devices' directory of a certain bus.

Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: gregkh@linuxfoundation.org
Cc: rafael.j.wysocki@intel.com
---
 drivers/base/bus.c         | 6 ++++++
 include/linux/device/bus.h | 1 +
 2 files changed, 7 insertions(+)

diff --git a/drivers/base/bus.c b/drivers/base/bus.c
index 36d0c65..3d621a8 100644
--- a/drivers/base/bus.c
+++ b/drivers/base/bus.c
@@ -899,6 +899,12 @@ struct kset *bus_get_kset(struct bus_type *bus)
 }
 EXPORT_SYMBOL_GPL(bus_get_kset);
 
+struct kset *bus_get_devices_kset(struct bus_type *bus)
+{
+	return bus->p->devices_kset;
+}
+EXPORT_SYMBOL_GPL(bus_get_devices_kset);
+
 struct klist *bus_get_device_klist(struct bus_type *bus)
 {
 	return &bus->p->klist_devices;
diff --git a/include/linux/device/bus.h b/include/linux/device/bus.h
index 1ea5e1d..0ab9273 100644
--- a/include/linux/device/bus.h
+++ b/include/linux/device/bus.h
@@ -283,6 +283,7 @@ extern int bus_unregister_notifier(struct bus_type *bus,
 #define BUS_NOTIFY_DRIVER_NOT_BOUND	0x00000008 /* driver fails to be bound */
 
 extern struct kset *bus_get_kset(struct bus_type *bus);
+extern struct kset *bus_get_devices_kset(struct bus_type *bus);
 extern struct klist *bus_get_device_klist(struct bus_type *bus);
 
 #endif
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24  1:22 [PATCH 0/7] perf: Add Sapphire Rapids server uncore support kan.liang
  2021-06-24  1:22 ` [PATCH 1/7] driver core: Add a way to get to bus devices kset kan.liang
@ 2021-06-24  1:22 ` kan.liang
  2021-06-24  5:48   ` Greg KH
  2021-06-24  1:22 ` [PATCH 3/7] perf/x86/intel/uncore: Create a symlink for an uncore PMU kan.liang
                   ` (4 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: kan.liang @ 2021-06-24  1:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: eranian, namhyung, acme, jolsa, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

A perf PMU may have two PMU names. For example, Intel Sapphire Rapids
server supports the discovery mechanism. Without the platform-specific
support, an uncore PMU is named by a type ID plus a box ID, e.g.,
uncore_type_0_0, because the real name of the uncore PMU cannot be
retrieved from the discovery table. With the platform-specific support
later, perf has the mapping information from a type ID to a specific
uncore unit. Just like the previous platforms, the uncore PMU will be
named by the real PMU name, e.g., uncore_cha_0. The user scripts which
work well with the old name may not work anymore. To avoid the issue, a
symlink should be created from the new to the old name.

The perf PMU devices are created under /sys/bus/event_source/devices/.
The symlink should be created in the same directory to facilitate the
perf tool.

Add a new variable, link_name, to store the new name of a PMU. Link it
to the old name if applies.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 include/linux/perf_event.h |  1 +
 kernel/events/core.c       | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index f5a6a2f..c8a3388 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -284,6 +284,7 @@ struct pmu {
 	const struct attribute_group	**attr_groups;
 	const struct attribute_group	**attr_update;
 	const char			*name;
+	const char			*link_name;
 	int				type;
 
 	/*
diff --git a/kernel/events/core.c b/kernel/events/core.c
index f0cc8e5..b6024f6 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -10952,6 +10952,16 @@ static struct bus_type pmu_bus = {
 
 static void pmu_dev_release(struct device *dev)
 {
+	struct pmu *pmu = dev_get_drvdata(dev);
+
+	if (pmu && pmu->link_name) {
+		struct kset *devices_kset;
+
+		devices_kset = bus_get_devices_kset(pmu->dev->bus);
+		if (devices_kset)
+			sysfs_remove_link(&devices_kset->kobj, pmu->link_name);
+	}
+
 	kfree(dev);
 }
 
@@ -10989,6 +10999,15 @@ static int pmu_dev_alloc(struct pmu *pmu)
 	if (ret)
 		goto del_dev;
 
+	if (pmu->link_name) {
+		struct kset *devices_kset;
+
+		devices_kset = bus_get_devices_kset(pmu->dev->bus);
+		if (devices_kset)
+			ret = sysfs_create_link(&devices_kset->kobj,
+						&pmu->dev->kobj,
+						pmu->link_name);
+	}
 out:
 	return ret;
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 3/7] perf/x86/intel/uncore: Create a symlink for an uncore PMU
  2021-06-24  1:22 [PATCH 0/7] perf: Add Sapphire Rapids server uncore support kan.liang
  2021-06-24  1:22 ` [PATCH 1/7] driver core: Add a way to get to bus devices kset kan.liang
  2021-06-24  1:22 ` [PATCH 2/7] perf: Create a symlink for a PMU kan.liang
@ 2021-06-24  1:22 ` kan.liang
  2021-06-24  5:44   ` Greg KH
  2021-06-24  1:22 ` [PATCH 4/7] perf/x86/intel/uncore: Add Sapphire Rapids server support kan.liang
                   ` (3 subsequent siblings)
  6 siblings, 1 reply; 33+ messages in thread
From: kan.liang @ 2021-06-24  1:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: eranian, namhyung, acme, jolsa, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The platform specific support for Sapphire Rapids will apply a
meaningful name for each uncore PMU. The script which works well with
the old name may not work anymore because of the name change. To avoid
the issue, a symlink should be created from the new name to the old
name.

Add an variable link_name to store the new name.

The rule to name a new meaningful uncore name is the same as the
previous platforms. Factor out __uncore_get_pmu_name().

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore.c | 28 ++++++++++++++++++++--------
 arch/x86/events/intel/uncore.h |  2 ++
 2 files changed, 22 insertions(+), 8 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 9bf4dbb..04e5d37 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -842,6 +842,18 @@ static const struct attribute_group uncore_pmu_attr_group = {
 	.attrs = uncore_pmu_attrs,
 };
 
+static void __uncore_get_pmu_name(char *pmu_name, const char *type_name,
+				  int num_boxes, int idx)
+{
+	if (num_boxes == 1) {
+		if (strlen(type_name) > 0)
+			sprintf(pmu_name, "uncore_%s", type_name);
+		else
+			sprintf(pmu_name, "uncore");
+	} else
+		sprintf(pmu_name, "uncore_%s_%d", type_name, idx);
+}
+
 static void uncore_get_pmu_name(struct intel_uncore_pmu *pmu)
 {
 	struct intel_uncore_type *type = pmu->type;
@@ -857,17 +869,17 @@ static void uncore_get_pmu_name(struct intel_uncore_pmu *pmu)
 			sprintf(pmu->name, "uncore_type_%u_%d",
 				type->type_id, type->box_ids[pmu->pmu_idx]);
 		}
+
+		if (type->link_name) {
+			__uncore_get_pmu_name(pmu->link_name, type->link_name,
+					      type->num_boxes, type->box_ids[pmu->pmu_idx]);
+			pmu->pmu.link_name = pmu->link_name;
+		}
 		return;
 	}
 
-	if (type->num_boxes == 1) {
-		if (strlen(type->name) > 0)
-			sprintf(pmu->name, "uncore_%s", type->name);
-		else
-			sprintf(pmu->name, "uncore");
-	} else
-		sprintf(pmu->name, "uncore_%s_%d", type->name, pmu->pmu_idx);
-
+	__uncore_get_pmu_name(pmu->name, type->name,
+			      type->num_boxes, pmu->pmu_idx);
 }
 
 static int uncore_pmu_register(struct intel_uncore_pmu *pmu)
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 187d728..2fc8565 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -46,6 +46,7 @@ struct intel_uncore_topology;
 
 struct intel_uncore_type {
 	const char *name;
+	const char *link_name;
 	int num_counters;
 	int num_boxes;
 	int perf_ctr_bits;
@@ -118,6 +119,7 @@ struct intel_uncore_ops {
 struct intel_uncore_pmu {
 	struct pmu			pmu;
 	char				name[UNCORE_PMU_NAME_LEN];
+	char				link_name[UNCORE_PMU_NAME_LEN];
 	int				pmu_idx;
 	int				func_id;
 	bool				registered;
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 4/7] perf/x86/intel/uncore: Add Sapphire Rapids server support
  2021-06-24  1:22 [PATCH 0/7] perf: Add Sapphire Rapids server uncore support kan.liang
                   ` (2 preceding siblings ...)
  2021-06-24  1:22 ` [PATCH 3/7] perf/x86/intel/uncore: Create a symlink for an uncore PMU kan.liang
@ 2021-06-24  1:22 ` kan.liang
  2021-06-24  1:22 ` [PATCH 5/7] perf/x86/intel/uncore: Factor out snr_uncore_mmio_map() kan.liang
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 33+ messages in thread
From: kan.liang @ 2021-06-24  1:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: eranian, namhyung, acme, jolsa, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Intel Sapphire Rapids supports a discovery mechanism, that allows an
uncore driver to discover the different components ("boxes") of the
chip.

All the generic information of the uncore boxes should be retrieved from
the discovery tables. This has been enabled with the commit edae1f06c2cd
("perf/x86/intel/uncore: Parse uncore discovery tables"). Add
use_discovery to indicate the case. The uncore driver doesn't need to
hard code the generic information for each uncore box.

But we still need to enable various functionality that cannot be
directly discovered. This is done here.
 - Add a meaningful name for each uncore block.
 - Add CHA filter support.
 - The layout of the control registers for each uncore block is a little
   bit different from the generic one. Set the platform specific format
   and ops. Expose the common ops which can be reused.
 - Add a fixed counter for IMC

All the undiscovered platform-specific features are hard code in the
spr_uncores[]. Add uncore_type_customized_copy(), instead of the memcpy,
to only overwrite these features.

Only the uncore blocks which are inculded in the discovery tables are
enabled here. Other uncore blocks, e.g., free-running counters, will be
supported in the following patch.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore.c           |  16 +-
 arch/x86/events/intel/uncore.h           |   3 +
 arch/x86/events/intel/uncore_discovery.c |  32 ++--
 arch/x86/events/intel/uncore_discovery.h |  21 +++
 arch/x86/events/intel/uncore_snbep.c     | 294 +++++++++++++++++++++++++++++++
 5 files changed, 349 insertions(+), 17 deletions(-)

diff --git a/arch/x86/events/intel/uncore.c b/arch/x86/events/intel/uncore.c
index 04e5d37..d2837b6 100644
--- a/arch/x86/events/intel/uncore.c
+++ b/arch/x86/events/intel/uncore.c
@@ -1675,6 +1675,7 @@ struct intel_uncore_init_fun {
 	void	(*cpu_init)(void);
 	int	(*pci_init)(void);
 	void	(*mmio_init)(void);
+	bool	use_discovery;
 };
 
 static const struct intel_uncore_init_fun nhm_uncore_init __initconst = {
@@ -1777,6 +1778,13 @@ static const struct intel_uncore_init_fun snr_uncore_init __initconst = {
 	.mmio_init = snr_uncore_mmio_init,
 };
 
+static const struct intel_uncore_init_fun spr_uncore_init __initconst = {
+	.cpu_init = spr_uncore_cpu_init,
+	.pci_init = spr_uncore_pci_init,
+	.mmio_init = spr_uncore_mmio_init,
+	.use_discovery = true,
+};
+
 static const struct intel_uncore_init_fun generic_uncore_init __initconst = {
 	.cpu_init = intel_uncore_generic_uncore_cpu_init,
 	.pci_init = intel_uncore_generic_uncore_pci_init,
@@ -1821,6 +1829,7 @@ static const struct x86_cpu_id intel_uncore_match[] __initconst = {
 	X86_MATCH_INTEL_FAM6_MODEL(ROCKETLAKE,		&rkl_uncore_init),
 	X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE,		&adl_uncore_init),
 	X86_MATCH_INTEL_FAM6_MODEL(ALDERLAKE_L,		&adl_uncore_init),
+	X86_MATCH_INTEL_FAM6_MODEL(SAPPHIRERAPIDS_X,	&spr_uncore_init),
 	X86_MATCH_INTEL_FAM6_MODEL(ATOM_TREMONT_D,	&snr_uncore_init),
 	{},
 };
@@ -1844,8 +1853,13 @@ static int __init intel_uncore_init(void)
 			uncore_init = (struct intel_uncore_init_fun *)&generic_uncore_init;
 		else
 			return -ENODEV;
-	} else
+	} else {
 		uncore_init = (struct intel_uncore_init_fun *)id->driver_data;
+		if (uncore_no_discover && uncore_init->use_discovery)
+			return -ENODEV;
+		if (uncore_init->use_discovery && !intel_uncore_has_discovery_tables())
+			return -ENODEV;
+	}
 
 	if (uncore_init->pci_init) {
 		pret = uncore_init->pci_init();
diff --git a/arch/x86/events/intel/uncore.h b/arch/x86/events/intel/uncore.h
index 2fc8565..050d7ff 100644
--- a/arch/x86/events/intel/uncore.h
+++ b/arch/x86/events/intel/uncore.h
@@ -610,6 +610,9 @@ void snr_uncore_mmio_init(void);
 int icx_uncore_pci_init(void);
 void icx_uncore_cpu_init(void);
 void icx_uncore_mmio_init(void);
+int spr_uncore_pci_init(void);
+void spr_uncore_cpu_init(void);
+void spr_uncore_mmio_init(void);
 
 /* uncore_nhmex.c */
 void nhmex_uncore_cpu_init(void);
diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index aba9bff..2de5563 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -337,17 +337,17 @@ static const struct attribute_group generic_uncore_format_group = {
 	.attrs = generic_uncore_formats_attr,
 };
 
-static void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
+void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box)
 {
 	wrmsrl(uncore_msr_box_ctl(box), GENERIC_PMON_BOX_CTL_INT);
 }
 
-static void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box)
 {
 	wrmsrl(uncore_msr_box_ctl(box), GENERIC_PMON_BOX_CTL_FRZ);
 }
 
-static void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box)
 {
 	wrmsrl(uncore_msr_box_ctl(box), 0);
 }
@@ -377,7 +377,7 @@ static struct intel_uncore_ops generic_uncore_msr_ops = {
 	.read_counter		= uncore_msr_read_counter,
 };
 
-static void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
+void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
 {
 	struct pci_dev *pdev = box->pci_dev;
 	int box_ctl = uncore_pci_box_ctl(box);
@@ -386,7 +386,7 @@ static void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box)
 	pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_INT);
 }
 
-static void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
 {
 	struct pci_dev *pdev = box->pci_dev;
 	int box_ctl = uncore_pci_box_ctl(box);
@@ -394,7 +394,7 @@ static void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box)
 	pci_write_config_dword(pdev, box_ctl, GENERIC_PMON_BOX_CTL_FRZ);
 }
 
-static void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box)
 {
 	struct pci_dev *pdev = box->pci_dev;
 	int box_ctl = uncore_pci_box_ctl(box);
@@ -403,7 +403,7 @@ static void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box)
 }
 
 static void intel_generic_uncore_pci_enable_event(struct intel_uncore_box *box,
-					    struct perf_event *event)
+						  struct perf_event *event)
 {
 	struct pci_dev *pdev = box->pci_dev;
 	struct hw_perf_event *hwc = &event->hw;
@@ -411,8 +411,8 @@ static void intel_generic_uncore_pci_enable_event(struct intel_uncore_box *box,
 	pci_write_config_dword(pdev, hwc->config_base, hwc->config);
 }
 
-static void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
-					     struct perf_event *event)
+void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
+					    struct perf_event *event)
 {
 	struct pci_dev *pdev = box->pci_dev;
 	struct hw_perf_event *hwc = &event->hw;
@@ -420,8 +420,8 @@ static void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
 	pci_write_config_dword(pdev, hwc->config_base, 0);
 }
 
-static u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
-					   struct perf_event *event)
+u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
+					  struct perf_event *event)
 {
 	struct pci_dev *pdev = box->pci_dev;
 	struct hw_perf_event *hwc = &event->hw;
@@ -454,7 +454,7 @@ static unsigned int generic_uncore_mmio_box_ctl(struct intel_uncore_box *box)
 	return type->box_ctls[box->dieid] + type->mmio_offsets[box->pmu->pmu_idx];
 }
 
-static void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
+void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
 {
 	unsigned int box_ctl = generic_uncore_mmio_box_ctl(box);
 	struct intel_uncore_type *type = box->pmu->type;
@@ -478,7 +478,7 @@ static void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box)
 	writel(GENERIC_PMON_BOX_CTL_INT, box->io_addr);
 }
 
-static void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
 {
 	if (!box->io_addr)
 		return;
@@ -486,7 +486,7 @@ static void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box)
 	writel(GENERIC_PMON_BOX_CTL_FRZ, box->io_addr);
 }
 
-static void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box)
+void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box)
 {
 	if (!box->io_addr)
 		return;
@@ -505,7 +505,7 @@ static void intel_generic_uncore_mmio_enable_event(struct intel_uncore_box *box,
 	writel(hwc->config, box->io_addr + hwc->config_base);
 }
 
-static void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
+void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
 					      struct perf_event *event)
 {
 	struct hw_perf_event *hwc = &event->hw;
@@ -568,7 +568,7 @@ static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 	return true;
 }
 
-static struct intel_uncore_type **
+struct intel_uncore_type **
 intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
 {
 	struct intel_uncore_discovery_type *type;
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 1d65293..b85655b 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -129,3 +129,24 @@ void intel_uncore_clear_discovery_tables(void);
 void intel_uncore_generic_uncore_cpu_init(void);
 int intel_uncore_generic_uncore_pci_init(void);
 void intel_uncore_generic_uncore_mmio_init(void);
+
+void intel_generic_uncore_msr_init_box(struct intel_uncore_box *box);
+void intel_generic_uncore_msr_disable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_msr_enable_box(struct intel_uncore_box *box);
+
+void intel_generic_uncore_mmio_init_box(struct intel_uncore_box *box);
+void intel_generic_uncore_mmio_disable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_mmio_enable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_mmio_disable_event(struct intel_uncore_box *box,
+					     struct perf_event *event);
+
+void intel_generic_uncore_pci_init_box(struct intel_uncore_box *box);
+void intel_generic_uncore_pci_disable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_pci_enable_box(struct intel_uncore_box *box);
+void intel_generic_uncore_pci_disable_event(struct intel_uncore_box *box,
+					    struct perf_event *event);
+u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
+					  struct perf_event *event);
+
+struct intel_uncore_type **
+intel_uncore_generic_init_uncores(enum uncore_access_type type_id);
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 7622762..a4f436a 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -1,6 +1,7 @@
 // SPDX-License-Identifier: GPL-2.0
 /* SandyBridge-EP/IvyTown uncore support */
 #include "uncore.h"
+#include "uncore_discovery.h"
 
 /* SNB-EP pci bus to socket mapping */
 #define SNBEP_CPUNODEID			0x40
@@ -454,6 +455,17 @@
 #define ICX_NUMBER_IMC_CHN			2
 #define ICX_IMC_MEM_STRIDE			0x4
 
+/* SPR */
+#define SPR_RAW_EVENT_MASK_EXT			0xffffff
+
+/* SPR CHA */
+#define SPR_CHA_PMON_CTL_TID_EN			(1 << 16)
+#define SPR_CHA_PMON_EVENT_MASK			(SNBEP_PMON_RAW_EVENT_MASK | \
+						 SPR_CHA_PMON_CTL_TID_EN)
+#define SPR_CHA_PMON_BOX_FILTER_TID		0x3ff
+
+#define SPR_C0_MSR_PMON_BOX_FILTER0		0x200e
+
 DEFINE_UNCORE_FORMAT_ATTR(event, event, "config:0-7");
 DEFINE_UNCORE_FORMAT_ATTR(event2, event, "config:0-6");
 DEFINE_UNCORE_FORMAT_ATTR(event_ext, event, "config:0-7,21");
@@ -466,6 +478,7 @@ DEFINE_UNCORE_FORMAT_ATTR(umask_ext4, umask, "config:8-15,32-55");
 DEFINE_UNCORE_FORMAT_ATTR(qor, qor, "config:16");
 DEFINE_UNCORE_FORMAT_ATTR(edge, edge, "config:18");
 DEFINE_UNCORE_FORMAT_ATTR(tid_en, tid_en, "config:19");
+DEFINE_UNCORE_FORMAT_ATTR(tid_en2, tid_en, "config:16");
 DEFINE_UNCORE_FORMAT_ATTR(inv, inv, "config:23");
 DEFINE_UNCORE_FORMAT_ATTR(thresh9, thresh, "config:24-35");
 DEFINE_UNCORE_FORMAT_ATTR(thresh8, thresh, "config:24-31");
@@ -5504,3 +5517,284 @@ void icx_uncore_mmio_init(void)
 }
 
 /* end of ICX uncore support */
+
+/* SPR uncore support */
+
+static void spr_uncore_msr_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct hw_perf_event_extra *reg1 = &hwc->extra_reg;
+
+	if (reg1->idx != EXTRA_REG_NONE)
+		wrmsrl(reg1->reg, reg1->config);
+
+	wrmsrl(hwc->config_base, hwc->config);
+}
+
+static void spr_uncore_msr_disable_event(struct intel_uncore_box *box,
+					 struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct hw_perf_event_extra *reg1 = &hwc->extra_reg;
+
+	if (reg1->idx != EXTRA_REG_NONE)
+		wrmsrl(reg1->reg, 0);
+
+	wrmsrl(hwc->config_base, 0);
+}
+
+static int spr_cha_hw_config(struct intel_uncore_box *box, struct perf_event *event)
+{
+	struct hw_perf_event_extra *reg1 = &event->hw.extra_reg;
+	bool tie_en = !!(event->hw.config & SPR_CHA_PMON_CTL_TID_EN);
+	struct intel_uncore_type *type = box->pmu->type;
+
+	if (tie_en) {
+		reg1->reg = SPR_C0_MSR_PMON_BOX_FILTER0 +
+			    HSWEP_CBO_MSR_OFFSET * type->box_ids[box->pmu->pmu_idx];
+		reg1->config = event->attr.config1 & SPR_CHA_PMON_BOX_FILTER_TID;
+		reg1->idx = 0;
+	}
+
+	return 0;
+}
+
+static struct intel_uncore_ops spr_uncore_chabox_ops = {
+	.init_box		= intel_generic_uncore_msr_init_box,
+	.disable_box		= intel_generic_uncore_msr_disable_box,
+	.enable_box		= intel_generic_uncore_msr_enable_box,
+	.disable_event		= spr_uncore_msr_disable_event,
+	.enable_event		= spr_uncore_msr_enable_event,
+	.read_counter		= uncore_msr_read_counter,
+	.hw_config		= spr_cha_hw_config,
+	.get_constraint		= uncore_get_constraint,
+	.put_constraint		= uncore_put_constraint,
+};
+
+static struct attribute *spr_uncore_cha_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask_ext4.attr,
+	&format_attr_tid_en2.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh8.attr,
+	&format_attr_filter_tid5.attr,
+	NULL,
+};
+static const struct attribute_group spr_uncore_chabox_format_group = {
+	.name = "format",
+	.attrs = spr_uncore_cha_formats_attr,
+};
+
+static struct intel_uncore_type spr_uncore_chabox = {
+	.link_name		= "cha",
+	.event_mask		= SPR_CHA_PMON_EVENT_MASK,
+	.event_mask_ext		= SPR_RAW_EVENT_MASK_EXT,
+	.num_shared_regs	= 1,
+	.ops			= &spr_uncore_chabox_ops,
+	.format_group		= &spr_uncore_chabox_format_group,
+};
+
+static struct intel_uncore_type spr_uncore_iio = {
+	.link_name		= "iio",
+	.event_mask		= SNBEP_PMON_RAW_EVENT_MASK,
+	.event_mask_ext		= SNR_IIO_PMON_RAW_EVENT_MASK_EXT,
+	.format_group		= &snr_uncore_iio_format_group,
+};
+
+static struct attribute *spr_uncore_raw_formats_attr[] = {
+	&format_attr_event.attr,
+	&format_attr_umask_ext4.attr,
+	&format_attr_edge.attr,
+	&format_attr_inv.attr,
+	&format_attr_thresh8.attr,
+	NULL,
+};
+
+static const struct attribute_group spr_uncore_raw_format_group = {
+	.name = "format",
+	.attrs = spr_uncore_raw_formats_attr,
+};
+
+#define SPR_UNCORE_COMMON_FORMAT()				\
+	.event_mask		= SNBEP_PMON_RAW_EVENT_MASK,	\
+	.event_mask_ext		= SPR_RAW_EVENT_MASK_EXT,	\
+	.format_group		= &spr_uncore_raw_format_group
+
+static struct intel_uncore_type spr_uncore_irp = {
+	SPR_UNCORE_COMMON_FORMAT(),
+	.link_name		= "irp",
+
+};
+
+static struct intel_uncore_type spr_uncore_m2pcie = {
+	SPR_UNCORE_COMMON_FORMAT(),
+	.link_name		= "m2pcie",
+};
+
+static struct intel_uncore_type spr_uncore_pcu = {
+	.link_name		= "pcu",
+};
+
+static void spr_uncore_mmio_enable_event(struct intel_uncore_box *box,
+					 struct perf_event *event)
+{
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (!box->io_addr)
+		return;
+
+	if (uncore_pmc_fixed(hwc->idx))
+		writel(SNBEP_PMON_CTL_EN, box->io_addr + hwc->config_base);
+	else
+		writel(hwc->config, box->io_addr + hwc->config_base);
+}
+
+static struct intel_uncore_ops spr_uncore_mmio_ops = {
+	.init_box		= intel_generic_uncore_mmio_init_box,
+	.exit_box		= uncore_mmio_exit_box,
+	.disable_box		= intel_generic_uncore_mmio_disable_box,
+	.enable_box		= intel_generic_uncore_mmio_enable_box,
+	.disable_event		= intel_generic_uncore_mmio_disable_event,
+	.enable_event		= spr_uncore_mmio_enable_event,
+	.read_counter		= uncore_mmio_read_counter,
+};
+
+static struct intel_uncore_type spr_uncore_imc = {
+	SPR_UNCORE_COMMON_FORMAT(),
+	.link_name		= "imc",
+	.fixed_ctr_bits		= 48,
+	.fixed_ctr		= SNR_IMC_MMIO_PMON_FIXED_CTR,
+	.fixed_ctl		= SNR_IMC_MMIO_PMON_FIXED_CTL,
+	.ops			= &spr_uncore_mmio_ops,
+};
+
+static void spr_uncore_pci_enable_event(struct intel_uncore_box *box,
+					struct perf_event *event)
+{
+	struct pci_dev *pdev = box->pci_dev;
+	struct hw_perf_event *hwc = &event->hw;
+
+	pci_write_config_dword(pdev, hwc->config_base + 4, (u32)(hwc->config >> 32));
+	pci_write_config_dword(pdev, hwc->config_base, (u32)hwc->config);
+}
+
+static struct intel_uncore_ops spr_uncore_pci_ops = {
+	.init_box		= intel_generic_uncore_pci_init_box,
+	.disable_box		= intel_generic_uncore_pci_disable_box,
+	.enable_box		= intel_generic_uncore_pci_enable_box,
+	.disable_event		= intel_generic_uncore_pci_disable_event,
+	.enable_event		= spr_uncore_pci_enable_event,
+	.read_counter		= intel_generic_uncore_pci_read_counter,
+};
+
+#define SPR_UNCORE_PCI_COMMON_FORMAT()			\
+	SPR_UNCORE_COMMON_FORMAT(),			\
+	.ops			= &spr_uncore_pci_ops
+
+static struct intel_uncore_type spr_uncore_m2m = {
+	SPR_UNCORE_PCI_COMMON_FORMAT(),
+	.link_name		= "m2m",
+};
+
+static struct intel_uncore_type spr_uncore_upi = {
+	SPR_UNCORE_PCI_COMMON_FORMAT(),
+	.link_name		= "upi",
+};
+
+static struct intel_uncore_type spr_uncore_m3upi = {
+	SPR_UNCORE_PCI_COMMON_FORMAT(),
+	.link_name		= "m3upi",
+};
+
+static struct intel_uncore_type spr_uncore_mdf = {
+	SPR_UNCORE_COMMON_FORMAT(),
+	.link_name		= "mdf",
+};
+
+#define UNCORE_SPR_NUM_UNCORE_TYPES		12
+
+static struct intel_uncore_type *spr_uncores[UNCORE_SPR_NUM_UNCORE_TYPES] = {
+	&spr_uncore_chabox,
+	&spr_uncore_iio,
+	&spr_uncore_irp,
+	&spr_uncore_m2pcie,
+	&spr_uncore_pcu,
+	NULL,
+	&spr_uncore_imc,
+	&spr_uncore_m2m,
+	&spr_uncore_upi,
+	&spr_uncore_m3upi,
+	NULL,
+	&spr_uncore_mdf,
+};
+
+static void uncore_type_customized_copy(struct intel_uncore_type *to_type,
+					struct intel_uncore_type *from_type)
+{
+	if (!to_type || !from_type)
+		return;
+
+	if (from_type->name)
+		to_type->name = from_type->name;
+	if (from_type->link_name)
+		to_type->link_name = from_type->link_name;
+	if (from_type->fixed_ctr_bits)
+		to_type->fixed_ctr_bits = from_type->fixed_ctr_bits;
+	if (from_type->event_mask)
+		to_type->event_mask = from_type->event_mask;
+	if (from_type->event_mask_ext)
+		to_type->event_mask_ext = from_type->event_mask_ext;
+	if (from_type->fixed_ctr)
+		to_type->fixed_ctr = from_type->fixed_ctr;
+	if (from_type->fixed_ctl)
+		to_type->fixed_ctl = from_type->fixed_ctl;
+	if (from_type->fixed_ctr_bits)
+		to_type->fixed_ctr_bits = from_type->fixed_ctr_bits;
+	if (from_type->num_shared_regs)
+		to_type->num_shared_regs = from_type->num_shared_regs;
+	if (from_type->constraints)
+		to_type->constraints = from_type->constraints;
+	if (from_type->ops)
+		to_type->ops = from_type->ops;
+	if (from_type->event_descs)
+		to_type->event_descs = from_type->event_descs;
+	if (from_type->format_group)
+		to_type->format_group = from_type->format_group;
+}
+
+static struct intel_uncore_type **
+uncore_get_uncores(enum uncore_access_type type_id)
+{
+	struct intel_uncore_type **types, **start_types;
+
+	start_types = types = intel_uncore_generic_init_uncores(type_id);
+
+	/* Only copy the customized features */
+	for (; *types; types++) {
+		if ((*types)->type_id >= UNCORE_SPR_NUM_UNCORE_TYPES)
+			continue;
+		uncore_type_customized_copy(*types, spr_uncores[(*types)->type_id]);
+	}
+
+	return start_types;
+}
+
+void spr_uncore_cpu_init(void)
+{
+	uncore_msr_uncores = uncore_get_uncores(UNCORE_ACCESS_MSR);
+}
+
+int spr_uncore_pci_init(void)
+{
+	uncore_pci_uncores = uncore_get_uncores(UNCORE_ACCESS_PCI);
+	return 0;
+}
+
+void spr_uncore_mmio_init(void)
+{
+	uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO);
+}
+
+/* end of SPR uncore support */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 5/7] perf/x86/intel/uncore: Factor out snr_uncore_mmio_map()
  2021-06-24  1:22 [PATCH 0/7] perf: Add Sapphire Rapids server uncore support kan.liang
                   ` (3 preceding siblings ...)
  2021-06-24  1:22 ` [PATCH 4/7] perf/x86/intel/uncore: Add Sapphire Rapids server support kan.liang
@ 2021-06-24  1:22 ` kan.liang
  2021-06-24  1:22 ` [PATCH 6/7] perf/x86/intel/uncore: Support free-running counters on Sapphire Rapids server kan.liang
  2021-06-24  1:22 ` [PATCH 7/7] perf/x86/intel/uncore: Fix invalid unit check kan.liang
  6 siblings, 0 replies; 33+ messages in thread
From: kan.liang @ 2021-06-24  1:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: eranian, namhyung, acme, jolsa, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

The IMC free-running counters on Sapphire Rapids server are also
accessed by MMIO, which is similar to the previous platforms, SNR and
ICX. The only difference is the device ID of the device which contains
BAR address.

Factor out snr_uncore_mmio_map() which ioremap the MMIO space. It can be
reused in the following patch for SPR.

Use the snr_uncore_mmio_map() in the icx_uncore_imc_freerunning_init_box().
There is no box_ctl for the free-running counters.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore_snbep.c | 36 +++++++++++++++++++++++++-----------
 1 file changed, 25 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index a4f436a..2696657 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -4792,13 +4792,15 @@ int snr_uncore_pci_init(void)
 	return 0;
 }
 
-static struct pci_dev *snr_uncore_get_mc_dev(int id)
+#define SNR_MC_DEVICE_ID	0x3451
+
+static struct pci_dev *snr_uncore_get_mc_dev(unsigned int device, int id)
 {
 	struct pci_dev *mc_dev = NULL;
 	int pkg;
 
 	while (1) {
-		mc_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x3451, mc_dev);
+		mc_dev = pci_get_device(PCI_VENDOR_ID_INTEL, device, mc_dev);
 		if (!mc_dev)
 			break;
 		pkg = uncore_pcibus_to_dieid(mc_dev->bus);
@@ -4808,16 +4810,17 @@ static struct pci_dev *snr_uncore_get_mc_dev(int id)
 	return mc_dev;
 }
 
-static void __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
-				       unsigned int box_ctl, int mem_offset)
+static int snr_uncore_mmio_map(struct intel_uncore_box *box,
+			       unsigned int box_ctl, int mem_offset,
+			       unsigned int device)
 {
-	struct pci_dev *pdev = snr_uncore_get_mc_dev(box->dieid);
+	struct pci_dev *pdev = snr_uncore_get_mc_dev(device, box->dieid);
 	struct intel_uncore_type *type = box->pmu->type;
 	resource_size_t addr;
 	u32 pci_dword;
 
 	if (!pdev)
-		return;
+		return -ENODEV;
 
 	pci_read_config_dword(pdev, SNR_IMC_MMIO_BASE_OFFSET, &pci_dword);
 	addr = (pci_dword & SNR_IMC_MMIO_BASE_MASK) << 23;
@@ -4830,16 +4833,25 @@ static void __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
 	box->io_addr = ioremap(addr, type->mmio_map_size);
 	if (!box->io_addr) {
 		pr_warn("perf uncore: Failed to ioremap for %s.\n", type->name);
-		return;
+		return -EINVAL;
 	}
 
-	writel(IVBEP_PMON_BOX_CTL_INT, box->io_addr);
+	return 0;
+}
+
+static void __snr_uncore_mmio_init_box(struct intel_uncore_box *box,
+				       unsigned int box_ctl, int mem_offset,
+				       unsigned int device)
+{
+	if (!snr_uncore_mmio_map(box, box_ctl, mem_offset, device))
+		writel(IVBEP_PMON_BOX_CTL_INT, box->io_addr);
 }
 
 static void snr_uncore_mmio_init_box(struct intel_uncore_box *box)
 {
 	__snr_uncore_mmio_init_box(box, uncore_mmio_box_ctl(box),
-				   SNR_IMC_MMIO_MEM0_OFFSET);
+				   SNR_IMC_MMIO_MEM0_OFFSET,
+				   SNR_MC_DEVICE_ID);
 }
 
 static void snr_uncore_mmio_disable_box(struct intel_uncore_box *box)
@@ -5413,7 +5425,8 @@ static void icx_uncore_imc_init_box(struct intel_uncore_box *box)
 	int mem_offset = (box->pmu->pmu_idx / ICX_NUMBER_IMC_CHN) * ICX_IMC_MEM_STRIDE +
 			 SNR_IMC_MMIO_MEM0_OFFSET;
 
-	__snr_uncore_mmio_init_box(box, box_ctl, mem_offset);
+	__snr_uncore_mmio_init_box(box, box_ctl, mem_offset,
+				   SNR_MC_DEVICE_ID);
 }
 
 static struct intel_uncore_ops icx_uncore_mmio_ops = {
@@ -5483,7 +5496,8 @@ static void icx_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
 	int mem_offset = box->pmu->pmu_idx * ICX_IMC_MEM_STRIDE +
 			 SNR_IMC_MMIO_MEM0_OFFSET;
 
-	__snr_uncore_mmio_init_box(box, uncore_mmio_box_ctl(box), mem_offset);
+	snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
+			    mem_offset, SNR_MC_DEVICE_ID);
 }
 
 static struct intel_uncore_ops icx_uncore_imc_freerunning_ops = {
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 6/7] perf/x86/intel/uncore: Support free-running counters on Sapphire Rapids server
  2021-06-24  1:22 [PATCH 0/7] perf: Add Sapphire Rapids server uncore support kan.liang
                   ` (4 preceding siblings ...)
  2021-06-24  1:22 ` [PATCH 5/7] perf/x86/intel/uncore: Factor out snr_uncore_mmio_map() kan.liang
@ 2021-06-24  1:22 ` kan.liang
  2021-06-24  1:22 ` [PATCH 7/7] perf/x86/intel/uncore: Fix invalid unit check kan.liang
  6 siblings, 0 replies; 33+ messages in thread
From: kan.liang @ 2021-06-24  1:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel; +Cc: eranian, namhyung, acme, jolsa, ak, Kan Liang

From: Kan Liang <kan.liang@linux.intel.com>

Several free-running counters for IMC and IIO uncore blocks are
supported on Sapphire Rapids server.

They are not enumerated in the discovery tables. Extend
generic_init_uncores() to support extra uncore types. The uncore types
for the free-running counters is inserted right after the uncore types
retrieved from the discovery table.

The number of the free-running counter boxes is calculated from the
number of corresponding standard boxes.

Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
 arch/x86/events/intel/uncore_discovery.c |  10 +-
 arch/x86/events/intel/uncore_discovery.h |   2 +-
 arch/x86/events/intel/uncore_snbep.c     | 188 ++++++++++++++++++++++++++++++-
 3 files changed, 189 insertions(+), 11 deletions(-)

diff --git a/arch/x86/events/intel/uncore_discovery.c b/arch/x86/events/intel/uncore_discovery.c
index 2de5563..8e9f1df 100644
--- a/arch/x86/events/intel/uncore_discovery.c
+++ b/arch/x86/events/intel/uncore_discovery.c
@@ -569,7 +569,7 @@ static bool uncore_update_uncore_type(enum uncore_access_type type_id,
 }
 
 struct intel_uncore_type **
-intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
+intel_uncore_generic_init_uncores(enum uncore_access_type type_id, int num_extra)
 {
 	struct intel_uncore_discovery_type *type;
 	struct intel_uncore_type **uncores;
@@ -577,7 +577,7 @@ intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
 	struct rb_node *node;
 	int i = 0;
 
-	uncores = kcalloc(num_discovered_types[type_id] + 1,
+	uncores = kcalloc(num_discovered_types[type_id] + num_extra + 1,
 			  sizeof(struct intel_uncore_type *), GFP_KERNEL);
 	if (!uncores)
 		return empty_uncore;
@@ -606,17 +606,17 @@ intel_uncore_generic_init_uncores(enum uncore_access_type type_id)
 
 void intel_uncore_generic_uncore_cpu_init(void)
 {
-	uncore_msr_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MSR);
+	uncore_msr_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MSR, 0);
 }
 
 int intel_uncore_generic_uncore_pci_init(void)
 {
-	uncore_pci_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_PCI);
+	uncore_pci_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_PCI, 0);
 
 	return 0;
 }
 
 void intel_uncore_generic_uncore_mmio_init(void)
 {
-	uncore_mmio_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MMIO);
+	uncore_mmio_uncores = intel_uncore_generic_init_uncores(UNCORE_ACCESS_MMIO, 0);
 }
diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index b85655b..7280c8a 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -149,4 +149,4 @@ u64 intel_generic_uncore_pci_read_counter(struct intel_uncore_box *box,
 					  struct perf_event *event);
 
 struct intel_uncore_type **
-intel_uncore_generic_init_uncores(enum uncore_access_type type_id);
+intel_uncore_generic_init_uncores(enum uncore_access_type type_id, int num_extra);
diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c
index 2696657..f530fc5 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -5728,6 +5728,8 @@ static struct intel_uncore_type spr_uncore_mdf = {
 };
 
 #define UNCORE_SPR_NUM_UNCORE_TYPES		12
+#define UNCORE_SPR_IIO				1
+#define UNCORE_SPR_IMC				6
 
 static struct intel_uncore_type *spr_uncores[UNCORE_SPR_NUM_UNCORE_TYPES] = {
 	&spr_uncore_chabox,
@@ -5744,6 +5746,145 @@ static struct intel_uncore_type *spr_uncores[UNCORE_SPR_NUM_UNCORE_TYPES] = {
 	&spr_uncore_mdf,
 };
 
+enum perf_uncore_spr_iio_freerunning_type_id {
+	SPR_IIO_MSR_IOCLK,
+	SPR_IIO_MSR_BW_IN,
+	SPR_IIO_MSR_BW_OUT,
+
+	SPR_IIO_FREERUNNING_TYPE_MAX,
+};
+
+static struct freerunning_counters spr_iio_freerunning[] = {
+	[SPR_IIO_MSR_IOCLK]	= { 0x340e, 0x1, 0x10, 1, 48 },
+	[SPR_IIO_MSR_BW_IN]	= { 0x3800, 0x1, 0x10, 8, 48 },
+	[SPR_IIO_MSR_BW_OUT]	= { 0x3808, 0x1, 0x10, 8, 48 },
+};
+
+static struct uncore_event_desc spr_uncore_iio_freerunning_events[] = {
+	/* Free-Running IIO CLOCKS Counter */
+	INTEL_UNCORE_EVENT_DESC(ioclk,			"event=0xff,umask=0x10"),
+	/* Free-Running IIO BANDWIDTH IN Counters */
+	INTEL_UNCORE_EVENT_DESC(bw_in_port0,		"event=0xff,umask=0x20"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port0.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port0.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port1,		"event=0xff,umask=0x21"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port1.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port1.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port2,		"event=0xff,umask=0x22"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port2.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port2.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port3,		"event=0xff,umask=0x23"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port3.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port3.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port4,		"event=0xff,umask=0x24"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port4.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port4.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port5,		"event=0xff,umask=0x25"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port5.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port5.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port6,		"event=0xff,umask=0x26"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port6.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port6.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port7,		"event=0xff,umask=0x27"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port7.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_in_port7.unit,	"MiB"),
+	/* Free-Running IIO BANDWIDTH OUT Counters */
+	INTEL_UNCORE_EVENT_DESC(bw_out_port0,		"event=0xff,umask=0x30"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port0.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port0.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port1,		"event=0xff,umask=0x31"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port1.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port1.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port2,		"event=0xff,umask=0x32"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port2.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port2.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port3,		"event=0xff,umask=0x33"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port3.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port3.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port4,		"event=0xff,umask=0x34"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port4.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port4.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port5,		"event=0xff,umask=0x35"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port5.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port5.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port6,		"event=0xff,umask=0x36"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port6.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port6.unit,	"MiB"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port7,		"event=0xff,umask=0x37"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port7.scale,	"3.814697266e-6"),
+	INTEL_UNCORE_EVENT_DESC(bw_out_port7.unit,	"MiB"),
+	{ /* end: all zeroes */ },
+};
+
+static struct intel_uncore_type spr_uncore_iio_free_running = {
+	.name			= "iio_free_running",
+	.num_counters		= 17,
+	.num_freerunning_types	= SPR_IIO_FREERUNNING_TYPE_MAX,
+	.freerunning		= spr_iio_freerunning,
+	.ops			= &skx_uncore_iio_freerunning_ops,
+	.event_descs		= spr_uncore_iio_freerunning_events,
+	.format_group		= &skx_uncore_iio_freerunning_format_group,
+};
+
+enum perf_uncore_spr_imc_freerunning_type_id {
+	SPR_IMC_DCLK,
+	SPR_IMC_PQ_CYCLES,
+
+	SPR_IMC_FREERUNNING_TYPE_MAX,
+};
+
+static struct freerunning_counters spr_imc_freerunning[] = {
+	[SPR_IMC_DCLK]		= { 0x22b0, 0x0, 0, 1, 48 },
+	[SPR_IMC_PQ_CYCLES]	= { 0x2318, 0x8, 0, 2, 48 },
+};
+
+static struct uncore_event_desc spr_uncore_imc_freerunning_events[] = {
+	INTEL_UNCORE_EVENT_DESC(dclk,			"event=0xff,umask=0x10"),
+
+	INTEL_UNCORE_EVENT_DESC(rpq_cycles,		"event=0xff,umask=0x20"),
+	INTEL_UNCORE_EVENT_DESC(wpq_cycles,		"event=0xff,umask=0x21"),
+	{ /* end: all zeroes */ },
+};
+
+#define SPR_MC_DEVICE_ID	0x3251
+
+static void spr_uncore_imc_freerunning_init_box(struct intel_uncore_box *box)
+{
+	int mem_offset = box->pmu->pmu_idx * ICX_IMC_MEM_STRIDE + SNR_IMC_MMIO_MEM0_OFFSET;
+
+	snr_uncore_mmio_map(box, uncore_mmio_box_ctl(box),
+			    mem_offset, SPR_MC_DEVICE_ID);
+}
+
+static struct intel_uncore_ops spr_uncore_imc_freerunning_ops = {
+	.init_box	= spr_uncore_imc_freerunning_init_box,
+	.exit_box	= uncore_mmio_exit_box,
+	.read_counter	= uncore_mmio_read_counter,
+	.hw_config	= uncore_freerunning_hw_config,
+};
+
+static struct intel_uncore_type spr_uncore_imc_free_running = {
+	.name			= "imc_free_running",
+	.num_counters		= 3,
+	.mmio_map_size		= SNR_IMC_MMIO_SIZE,
+	.num_freerunning_types	= SPR_IMC_FREERUNNING_TYPE_MAX,
+	.freerunning		= spr_imc_freerunning,
+	.ops			= &spr_uncore_imc_freerunning_ops,
+	.event_descs		= spr_uncore_imc_freerunning_events,
+	.format_group		= &skx_uncore_iio_freerunning_format_group,
+};
+
+#define UNCORE_SPR_MSR_EXTRA_UNCORES		1
+#define UNCORE_SPR_MMIO_EXTRA_UNCORES		1
+
+static struct intel_uncore_type *spr_msr_uncores[UNCORE_SPR_MSR_EXTRA_UNCORES] = {
+	&spr_uncore_iio_free_running,
+};
+
+static struct intel_uncore_type *spr_mmio_uncores[UNCORE_SPR_MMIO_EXTRA_UNCORES] = {
+	&spr_uncore_imc_free_running,
+};
+
 static void uncore_type_customized_copy(struct intel_uncore_type *to_type,
 					struct intel_uncore_type *from_type)
 {
@@ -5779,11 +5920,13 @@ static void uncore_type_customized_copy(struct intel_uncore_type *to_type,
 }
 
 static struct intel_uncore_type **
-uncore_get_uncores(enum uncore_access_type type_id)
+uncore_get_uncores(enum uncore_access_type type_id, int num_extra,
+		    struct intel_uncore_type **extra)
 {
 	struct intel_uncore_type **types, **start_types;
+	int i;
 
-	start_types = types = intel_uncore_generic_init_uncores(type_id);
+	start_types = types = intel_uncore_generic_init_uncores(type_id, num_extra);
 
 	/* Only copy the customized features */
 	for (; *types; types++) {
@@ -5792,23 +5935,58 @@ uncore_get_uncores(enum uncore_access_type type_id)
 		uncore_type_customized_copy(*types, spr_uncores[(*types)->type_id]);
 	}
 
+	for (i = 0; i < num_extra; i++, types++)
+		*types = extra[i];
+
 	return start_types;
 }
 
+static struct intel_uncore_type *
+uncore_find_type_by_id(struct intel_uncore_type **types, int type_id)
+{
+	for (; *types; types++) {
+		if (type_id == (*types)->type_id)
+			return *types;
+	}
+
+	return NULL;
+}
+
 void spr_uncore_cpu_init(void)
 {
-	uncore_msr_uncores = uncore_get_uncores(UNCORE_ACCESS_MSR);
+	struct intel_uncore_type *iio;
+
+	uncore_msr_uncores = uncore_get_uncores(UNCORE_ACCESS_MSR,
+						UNCORE_SPR_MSR_EXTRA_UNCORES,
+						spr_msr_uncores);
+
+	iio = uncore_find_type_by_id(uncore_msr_uncores, UNCORE_SPR_IIO);
+	if (iio)
+		spr_uncore_iio_free_running.num_boxes = iio->num_boxes;
 }
 
 int spr_uncore_pci_init(void)
 {
-	uncore_pci_uncores = uncore_get_uncores(UNCORE_ACCESS_PCI);
+	uncore_pci_uncores = uncore_get_uncores(UNCORE_ACCESS_PCI, 0, NULL);
 	return 0;
 }
 
 void spr_uncore_mmio_init(void)
 {
-	uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO);
+	struct intel_uncore_type *imc;
+
+	int ret = snbep_pci2phy_map_init(0x3250, SKX_CPUNODEID, SKX_GIDNIDMAP, true);
+
+	if (ret)
+		uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO, 0, NULL);
+	else {
+		uncore_mmio_uncores = uncore_get_uncores(UNCORE_ACCESS_MMIO,
+							 UNCORE_SPR_MMIO_EXTRA_UNCORES,
+							 spr_mmio_uncores);
+		imc = uncore_find_type_by_id(uncore_mmio_uncores, UNCORE_SPR_IMC);
+		if (imc)
+			spr_uncore_imc_free_running.num_boxes = imc->num_boxes / 2;
+	}
 }
 
 /* end of SPR uncore support */
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* [PATCH 7/7] perf/x86/intel/uncore: Fix invalid unit check
  2021-06-24  1:22 [PATCH 0/7] perf: Add Sapphire Rapids server uncore support kan.liang
                   ` (5 preceding siblings ...)
  2021-06-24  1:22 ` [PATCH 6/7] perf/x86/intel/uncore: Support free-running counters on Sapphire Rapids server kan.liang
@ 2021-06-24  1:22 ` kan.liang
  6 siblings, 0 replies; 33+ messages in thread
From: kan.liang @ 2021-06-24  1:22 UTC (permalink / raw)
  To: peterz, mingo, linux-kernel
  Cc: eranian, namhyung, acme, jolsa, ak, Kan Liang, stable

From: Kan Liang <kan.liang@linux.intel.com>

The uncore unit with the type ID 0 and the unit ID 0 is missed.

The table3 of the uncore unit maybe 0. The
uncore_discovery_invalid_unit() mistakenly treated it as an invalid
value.

Remove the !unit.table3 check.

Fixes: edae1f06c2cd ("perf/x86/intel/uncore: Parse uncore discovery tables")
Reviewed-by: Andi Kleen <ak@linux.intel.com>
Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
Cc: stable@vger.kernel.org
---
 arch/x86/events/intel/uncore_discovery.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/events/intel/uncore_discovery.h b/arch/x86/events/intel/uncore_discovery.h
index 7280c8a..6d735611 100644
--- a/arch/x86/events/intel/uncore_discovery.h
+++ b/arch/x86/events/intel/uncore_discovery.h
@@ -30,7 +30,7 @@
 
 
 #define uncore_discovery_invalid_unit(unit)			\
-	(!unit.table1 || !unit.ctl || !unit.table3 ||	\
+	(!unit.table1 || !unit.ctl || \
 	 unit.table1 == -1ULL || unit.ctl == -1ULL ||	\
 	 unit.table3 == -1ULL)
 
-- 
2.7.4


^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 1/7] driver core: Add a way to get to bus devices kset
  2021-06-24  1:22 ` [PATCH 1/7] driver core: Add a way to get to bus devices kset kan.liang
@ 2021-06-24  5:41   ` Greg KH
  0 siblings, 0 replies; 33+ messages in thread
From: Greg KH @ 2021-06-24  5:41 UTC (permalink / raw)
  To: kan.liang
  Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa, ak,
	rafael.j.wysocki

On Wed, Jun 23, 2021 at 06:22:03PM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> Add an accessor function to get to the bus devices kset associated with
> a struct bus_type.
> 
> The user of this is the following perf changes, which will need to get
> to the kobj of the 'devices' directory of a certain bus.

What "following perf changes"?

Nothing should be messing with the kobject of a bus, where are those
patches?

> Reviewed-by: Rafael J. Wysocki <rafael.j.wysocki@intel.com>
> Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
> Cc: gregkh@linuxfoundation.org
> Cc: rafael.j.wysocki@intel.com
> ---
>  drivers/base/bus.c         | 6 ++++++
>  include/linux/device/bus.h | 1 +
>  2 files changed, 7 insertions(+)
> 
> diff --git a/drivers/base/bus.c b/drivers/base/bus.c
> index 36d0c65..3d621a8 100644
> --- a/drivers/base/bus.c
> +++ b/drivers/base/bus.c
> @@ -899,6 +899,12 @@ struct kset *bus_get_kset(struct bus_type *bus)
>  }
>  EXPORT_SYMBOL_GPL(bus_get_kset);
>  
> +struct kset *bus_get_devices_kset(struct bus_type *bus)
> +{
> +	return bus->p->devices_kset;
> +}
> +EXPORT_SYMBOL_GPL(bus_get_devices_kset);

No, sorry, this feels really wrong, why does anyone care about the bus
kset?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 3/7] perf/x86/intel/uncore: Create a symlink for an uncore PMU
  2021-06-24  1:22 ` [PATCH 3/7] perf/x86/intel/uncore: Create a symlink for an uncore PMU kan.liang
@ 2021-06-24  5:44   ` Greg KH
  0 siblings, 0 replies; 33+ messages in thread
From: Greg KH @ 2021-06-24  5:44 UTC (permalink / raw)
  To: kan.liang; +Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa, ak

On Wed, Jun 23, 2021 at 06:22:05PM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> The platform specific support for Sapphire Rapids will apply a
> meaningful name for each uncore PMU. The script which works well with
> the old name may not work anymore because of the name change. To avoid
> the issue, a symlink should be created from the new name to the old
> name.

What script needs a specific name?

And where in Documentation/ABI/ are you documenting this change and new
symlink?

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24  1:22 ` [PATCH 2/7] perf: Create a symlink for a PMU kan.liang
@ 2021-06-24  5:48   ` Greg KH
  2021-06-24 14:24     ` Andi Kleen
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2021-06-24  5:48 UTC (permalink / raw)
  To: kan.liang; +Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa, ak

On Wed, Jun 23, 2021 at 06:22:04PM -0700, kan.liang@linux.intel.com wrote:
> From: Kan Liang <kan.liang@linux.intel.com>
> 
> A perf PMU may have two PMU names. For example, Intel Sapphire Rapids
> server supports the discovery mechanism. Without the platform-specific
> support, an uncore PMU is named by a type ID plus a box ID, e.g.,
> uncore_type_0_0, because the real name of the uncore PMU cannot be
> retrieved from the discovery table. With the platform-specific support
> later, perf has the mapping information from a type ID to a specific
> uncore unit. Just like the previous platforms, the uncore PMU will be
> named by the real PMU name, e.g., uncore_cha_0. The user scripts which
> work well with the old name may not work anymore. To avoid the issue, a
> symlink should be created from the new to the old name.
> 
> The perf PMU devices are created under /sys/bus/event_source/devices/.
> The symlink should be created in the same directory to facilitate the
> perf tool.
> 
> Add a new variable, link_name, to store the new name of a PMU. Link it
> to the old name if applies.

Ah, here is the "new name".  This needs to be documented.

But first off, why is this symlink suddenly needed?  What is so special
about this new hardware that it breaks the existing model?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24  5:48   ` Greg KH
@ 2021-06-24 14:24     ` Andi Kleen
  2021-06-24 14:29       ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2021-06-24 14:24 UTC (permalink / raw)
  To: Greg KH, kan.liang
  Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


> But first off, why is this symlink suddenly needed?  What is so special
> about this new hardware that it breaks the existing model?

The driver can be in two modes:

- Driver fully knows the hardware and puts in the correct Linux names

- Driver doesn't know the hardware but is in a fallback mode where it 
only looks at a discovery table. There we don't have the correct names, 
just an numeric identifier for the different hardware sub components.

In the later mode the numeric identifier is used in sysfs, in the former 
case the full Linux name. But we want to keep some degree of Linux user 
space compatibility between the two, that is why the full mode creates a 
symlink from the "numeric" name. This way the (ugly) identifiers needed 
for the fallback mode work everywhere.

-Andi


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 14:24     ` Andi Kleen
@ 2021-06-24 14:29       ` Greg KH
  2021-06-24 15:24         ` Andi Kleen
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2021-06-24 14:29 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
> 
> > But first off, why is this symlink suddenly needed?  What is so special
> > about this new hardware that it breaks the existing model?
> 
> The driver can be in two modes:
> 
> - Driver fully knows the hardware and puts in the correct Linux names
> 
> - Driver doesn't know the hardware but is in a fallback mode where it only
> looks at a discovery table. There we don't have the correct names, just an
> numeric identifier for the different hardware sub components.

Why does this matter?  Why would the driver not "know" the hardware?  If
it doesn't know it, why would it bind to it?

> In the later mode the numeric identifier is used in sysfs, in the former
> case the full Linux name. But we want to keep some degree of Linux user
> space compatibility between the two, that is why the full mode creates a
> symlink from the "numeric" name. This way the (ugly) identifiers needed for
> the fallback mode work everywhere.

So what _exactly_ does the symlink do here?  What is it from->to?

And where is it being documented?  What userspace tool needs to be fixed
up so that the symlink can be removed?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 14:29       ` Greg KH
@ 2021-06-24 15:24         ` Andi Kleen
  2021-06-24 15:31           ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2021-06-24 15:24 UTC (permalink / raw)
  To: Greg KH
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


On 6/24/2021 7:29 AM, Greg KH wrote:
> On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
>>> But first off, why is this symlink suddenly needed?  What is so special
>>> about this new hardware that it breaks the existing model?
>> The driver can be in two modes:
>>
>> - Driver fully knows the hardware and puts in the correct Linux names
>>
>> - Driver doesn't know the hardware but is in a fallback mode where it only
>> looks at a discovery table. There we don't have the correct names, just an
>> numeric identifier for the different hardware sub components.
> Why does this matter?  Why would the driver not "know" the hardware?  If
> it doesn't know it, why would it bind to it?

It's a similar concept as a PCI class. How to have a driver that can 
handle future hardware, but with some restrictions

The perf CPU PMU has had a similar concept for a long time. The driver 
can be either in architectural mode (with a subset of features), or be 
fully enabled. This allows users who are on an older kernel to still use 
at least a subset of the functionality.

It will bind as long as the discovery table is there.

>
>> In the later mode the numeric identifier is used in sysfs, in the former
>> case the full Linux name. But we want to keep some degree of Linux user
>> space compatibility between the two, that is why the full mode creates a
>> symlink from the "numeric" name. This way the (ugly) identifiers needed for
>> the fallback mode work everywhere.
> So what _exactly_ does the symlink do here?  What is it from->to?

It's from numeric identifier to full perf name

In fallback mode there is no symlink, only the numeric identifier.


>
> And where is it being documented?  What userspace tool needs to be fixed
> up so that the symlink can be removed?

The names are visible in the perf command lines. Perf supports either 
name without changes. So it's not about fixing a specific tool, but 
about using the drivers in both modes, with limited compatibility 
between the two.

Yes probably it needs better documentation.

-Andi


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 15:24         ` Andi Kleen
@ 2021-06-24 15:31           ` Greg KH
  2021-06-24 17:07             ` Liang, Kan
  2021-06-24 17:28             ` Andi Kleen
  0 siblings, 2 replies; 33+ messages in thread
From: Greg KH @ 2021-06-24 15:31 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Thu, Jun 24, 2021 at 08:24:29AM -0700, Andi Kleen wrote:
> 
> On 6/24/2021 7:29 AM, Greg KH wrote:
> > On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
> > > > But first off, why is this symlink suddenly needed?  What is so special
> > > > about this new hardware that it breaks the existing model?
> > > The driver can be in two modes:
> > > 
> > > - Driver fully knows the hardware and puts in the correct Linux names
> > > 
> > > - Driver doesn't know the hardware but is in a fallback mode where it only
> > > looks at a discovery table. There we don't have the correct names, just an
> > > numeric identifier for the different hardware sub components.
> > Why does this matter?  Why would the driver not "know" the hardware?  If
> > it doesn't know it, why would it bind to it?
> 
> It's a similar concept as a PCI class. How to have a driver that can handle
> future hardware, but with some restrictions

But this is NOT how busses work in the driver model.

PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
a driver goes from being handled by "generic_pci_type_foo" to
"vendor_foo".  Userspace can handle the change and life goes on.

> The perf CPU PMU has had a similar concept for a long time. The driver can
> be either in architectural mode (with a subset of features), or be fully
> enabled. This allows users who are on an older kernel to still use at least
> a subset of the functionality.

So a device name will move from "generic" to "specific", right?

Why does a bus have to do with any of this?

> > > In the later mode the numeric identifier is used in sysfs, in the former
> > > case the full Linux name. But we want to keep some degree of Linux user
> > > space compatibility between the two, that is why the full mode creates a
> > > symlink from the "numeric" name. This way the (ugly) identifiers needed for
> > > the fallback mode work everywhere.
> > So what _exactly_ does the symlink do here?  What is it from->to?
> 
> It's from numeric identifier to full perf name
> 
> In fallback mode there is no symlink, only the numeric identifier.

Those two sentences do not describe a sysfs path to me.

Where are the Documentation/ABI/ entries for all of this?

> > And where is it being documented?  What userspace tool needs to be fixed
> > up so that the symlink can be removed?
> 
> The names are visible in the perf command lines. Perf supports either name
> without changes. So it's not about fixing a specific tool, but about using
> the drivers in both modes, with limited compatibility between the two.

But a driver does not caer.  And if perf does not care, who cares?

still totally confused,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 15:31           ` Greg KH
@ 2021-06-24 17:07             ` Liang, Kan
  2021-06-24 17:35               ` Greg KH
  2021-06-25  5:17               ` Greg KH
  2021-06-24 17:28             ` Andi Kleen
  1 sibling, 2 replies; 33+ messages in thread
From: Liang, Kan @ 2021-06-24 17:07 UTC (permalink / raw)
  To: Greg KH, Andi Kleen
  Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa



On 6/24/2021 11:31 AM, Greg KH wrote:
> On Thu, Jun 24, 2021 at 08:24:29AM -0700, Andi Kleen wrote:
>>
>> On 6/24/2021 7:29 AM, Greg KH wrote:
>>> On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
>>>>> But first off, why is this symlink suddenly needed?  What is so special
>>>>> about this new hardware that it breaks the existing model?
>>>> The driver can be in two modes:
>>>>
>>>> - Driver fully knows the hardware and puts in the correct Linux names
>>>>
>>>> - Driver doesn't know the hardware but is in a fallback mode where it only
>>>> looks at a discovery table. There we don't have the correct names, just an
>>>> numeric identifier for the different hardware sub components.
>>> Why does this matter?  Why would the driver not "know" the hardware?  If
>>> it doesn't know it, why would it bind to it?
>>
>> It's a similar concept as a PCI class. How to have a driver that can handle
>> future hardware, but with some restrictions
> 
> But this is NOT how busses work in the driver model.
> 
> PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
> a driver goes from being handled by "generic_pci_type_foo" to
> "vendor_foo".  Userspace can handle the change and life goes on.
> 
>> The perf CPU PMU has had a similar concept for a long time. The driver can
>> be either in architectural mode (with a subset of features), or be fully
>> enabled. This allows users who are on an older kernel to still use at least
>> a subset of the functionality.
> 
> So a device name will move from "generic" to "specific", right?

We'd like to keep both names.

Here is the whole story.

This patch set provides platform-specific support for Sapphire Rapids 
server on top of the discovery mechanism which was recently introduced 
in the uncore driver.

https://lore.kernel.org/lkml/1616003977-90612-1-git-send-email-kan.liang@linux.intel.com/

The discovery mechanism is a mechanism of self-describing HW. By reading 
through an MMIO page worth of information, the uncore driver can 
‘discover’ all the standard uncore PMON registers. The driver "know" the 
hardware.
However, the discovery mechanism only provides basic uncore information. 
It doesn't provide a meaningful name for each uncore unit, but a type 
ID. So we can only name an uncore PMU by it's type ID (numeric name), 
e.g., uncore_type_0_0. End uses can use it to access the uncore counters.

With this platform-specific patch, the driver can "know" more about the 
hardware, e.g., the mapping between a type ID and an meaningful uncore 
name. It can creates an uncore PMU with a meaningful name just like the 
previous platforms.

It usually takes some time to implement a platform-specific patch. Some 
early user or users still have old drivers may have already wrote some 
scripts (which composed of many perf commands for core and uncore 
events) with the numeric name, and done some basic test. We don't want 
to bother them to update their scripts for the new driver. So we'd like 
to keep both names and add a symlink from a meaningful name to a type ID 
name. For example,

/sys/bus/event_source/devices/uncore_iio_0 -> 
../../../devices/uncore_type_1_0

> 
> Why does a bus have to do with any of this?

Both names should be created under /sys/bus/event_source/devices/

Exposing the bus_get_devices_kset() is to get the bus devices kset, 
which is used to create the symlink.

> 
>>>> In the later mode the numeric identifier is used in sysfs, in the former
>>>> case the full Linux name. But we want to keep some degree of Linux user
>>>> space compatibility between the two, that is why the full mode creates a
>>>> symlink from the "numeric" name. This way the (ugly) identifiers needed for
>>>> the fallback mode work everywhere.
>>> So what _exactly_ does the symlink do here?  What is it from->to?
>>
>> It's from numeric identifier to full perf name
>>
>> In fallback mode there is no symlink, only the numeric identifier.
> 
> Those two sentences do not describe a sysfs path to me.
> 
> Where are the Documentation/ABI/ entries for all of this?

The below patch created a file for uncore to explain the naming rules.
I also found that the current sysfs-devices-mapping is for uncore driver 
as well. So I move it into the uncore document.


> 
>>> And where is it being documented?  What userspace tool needs to be fixed
>>> up so that the symlink can be removed?
>>
>> The names are visible in the perf command lines. Perf supports either name
>> without changes. So it's not about fixing a specific tool, but about using
>> the drivers in both modes, with limited compatibility between the two.
> 
> But a driver does not caer.  And if perf does not care, who cares?
>

That's to facilitate the user's script which composed of many perf 
commands for core and uncore events.

They may run the script on both the old driver (without 
platform-specific support) and the new driver (with platform-specific 
support). The numeric name with the type ID is not deprecated. So they 
don't need to update the uncore PMU names in their scripts when updating 
the driver.



 From f6989d87988866994ed4e198be16aa44e3f682eb Mon Sep 17 00:00:00 2001
From: Kan Liang <kan.liang@linux.intel.com>
Date: Thu, 24 Jun 2021 09:16:21 -0700
Subject: [PATCH] docs: ABI: testing: Add perf uncore PMU support

Document the naming rules for the uncore related PMUs.

Move the sysfs-devices-mapping into this uncore document. The attribute
is only available for uncore PMUs. Update the location of the attribute
to /sys/bus/event_source/devices/. Update the Contact which is not
available anymore.

Signed-off-by: Kan Liang <kan.liang@linux.intel.com>
---
  .../testing/sysfs-bus-event_source-devices-uncore  | 126 
+++++++++++++++++++++
  Documentation/ABI/testing/sysfs-devices-mapping    |  34 ------
  2 files changed, 126 insertions(+), 34 deletions(-)
  create mode 100644 
Documentation/ABI/testing/sysfs-bus-event_source-devices-uncore
  delete mode 100644 Documentation/ABI/testing/sysfs-devices-mapping

diff --git 
a/Documentation/ABI/testing/sysfs-bus-event_source-devices-uncore 
b/Documentation/ABI/testing/sysfs-bus-event_source-devices-uncore
new file mode 100644
index 0000000..612fa19
--- /dev/null
+++ b/Documentation/ABI/testing/sysfs-bus-event_source-devices-uncore
@@ -0,0 +1,126 @@
+What:		/sys/bus/event_source/devices/uncore_<unit_name>[_<unit_idx>]
+Date:		June 2021
+KernelVersion:  3.5
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Generic Uncore performance monitoring unit (PMU)
+
+		A collection of uncore performance monitoring units that
+		may be supported. These units can be monitored using the
+		'perf(1)' tool.
+
+		The name of each uncore PMU would look like:
+
+			uncore_<unit_name>[_<unit_idx>]
+
+		Where <unit_name> is the name of the uncore PMU.
+		If the unit number is more than one, <unit_idx>
+		indicates the index.
+
+		Examples:
+
+			The second uncore PMU for the Caching/Home Agent.
+
+				uncore_cha_2
+
+			There is only one uncore PMU for the Power
+			Control Unit
+
+				uncore_pcu
+
+
+		A mechanism of self-describing HW for the uncore PMU has
+		been introduced since the Intel Sapphire Rapids server.
+		Perf can provide basic uncore support based on the
+		generic uncore unit information provided by the
+		discovery mechanism. But the discovery mechanism only
+		provides a unit type ID for an uncore PMU, not a unit
+		name. The unit name can only be parsed with the later
+		platform-specific support.
+
+		On a platform that supports the discovery mechanism,
+		but lacks platform-specific support, the type ID is used
+		to name a uncore PMU.
+
+			uncore_type_<type_ID>[_<unit_idx>]
+
+		Examples:
+
+			The second uncore PMU for the type 0 unit
+
+				uncore_type_0_2
+
+		When the platform-specific support is ready on the
+		platform that supports the discovery mechanism. The unit
+		name should be used to name the uncore PMU. But the
+		numeric name with the type ID is not deprecated. Both
+		names coexist. A symlink is created to link the two
+		names.
+
+		Examples:
+
+			/sys/bus/event_source/devices/uncore_iio_0 ->
+			../../../devices/uncore_type_1_0
+
+		For users, who may use both the old driver (without
+		platform-specific support) and the new driver (with
+		platform-specific support), the numeric name with the
+		type ID is recommended, since they don't need to update
+		the uncore PMU names in their scripts when updating the
+		driver.
+
+		The attribute groups created under the PMU are similar to
+		other core PMUs.
+		(See ABI/testing/sysfs-bus-event_source-devices-format and
+		 ABI/testing/sysfs-bus-event_source-devices-events).
+
+
+What:	 
/sys/bus/event_source/devices/uncore_<unit_name>_free_running_<unit_idx>
+Date:		June 2021
+KernelVersion:  4.17
+Contact:	Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:	Uncore Free-running performance monitoring unit (PMU)
+
+		A collection of uncore Free-running performance
+		monitoring units that may be supported. These units can
+		be monitored using the 'perf(1)' tool.
+
+		The attribute groups created under the PMU are similar to
+		other core PMUs.
+		(See ABI/testing/sysfs-bus-event_source-devices-format and
+		 ABI/testing/sysfs-bus-event_source-devices-events).
+
+What:           /sys/bus/event_source/devices/uncore_iio_x/dieX
+Date:           February 2020
+Contact:        Linux kernel mailing list <linux-kernel@vger.kernel.org>
+Description:
+                Each IIO stack (PCIe root port) has its own IIO PMON 
block, so
+                each dieX file (where X is die number) holds 
"Segment:Root Bus"
+                for PCIe root port, which can be monitored by that IIO PMON
+                block.
+                For example, on 4-die Xeon platform with up to 6 IIO 
stacks per
+                die and, therefore, 6 IIO PMON blocks per die, the 
mapping of
+                IIO PMON block 0 exposes as the following::
+
+		    $ ls /sys/bus/event_source/devices/uncore_iio_0/die*
+		    -r--r--r-- /sys/bus/event_source/devices/uncore_iio_0/die0
+		    -r--r--r-- /sys/bus/event_source/devices/uncore_iio_0/die1
+		    -r--r--r-- /sys/bus/event_source/devices/uncore_iio_0/die2
+		    -r--r--r-- /sys/bus/event_source/devices/uncore_iio_0/die3
+
+		    $ tail /sys/bus/event_source/devices/uncore_iio_0/die*
+		    ==> /sys/bus/event_source/devices/uncore_iio_0/die0 <==
+		    0000:00
+		    ==> /sys/bus/event_source/devices/uncore_iio_0/die1 <==
+		    0000:40
+		    ==> /sys/bus/event_source/devices/uncore_iio_0/die2 <==
+		    0000:80
+		    ==> /sys/bus/event_source/devices/uncore_iio_0/die3 <==
+		    0000:c0
+
+                Which means::
+
+		    IIO PMU 0 on die 0 belongs to PCI RP on bus 0x00, domain 0x0000
+		    IIO PMU 0 on die 1 belongs to PCI RP on bus 0x40, domain 0x0000
+		    IIO PMU 0 on die 2 belongs to PCI RP on bus 0x80, domain 0x0000
+		    IIO PMU 0 on die 3 belongs to PCI RP on bus 0xc0, domain 0x0000
+
diff --git a/Documentation/ABI/testing/sysfs-devices-mapping 
b/Documentation/ABI/testing/sysfs-devices-mapping
deleted file mode 100644
index 8d202ba..0000000
--- a/Documentation/ABI/testing/sysfs-devices-mapping
+++ /dev/null
@@ -1,34 +0,0 @@
-What:           /sys/devices/uncore_iio_x/dieX
-Date:           February 2020
-Contact:        Roman Sudarikov <roman.sudarikov@linux.intel.com>
-Description:
-                Each IIO stack (PCIe root port) has its own IIO PMON 
block, so
-                each dieX file (where X is die number) holds 
"Segment:Root Bus"
-                for PCIe root port, which can be monitored by that IIO PMON
-                block.
-                For example, on 4-die Xeon platform with up to 6 IIO 
stacks per
-                die and, therefore, 6 IIO PMON blocks per die, the 
mapping of
-                IIO PMON block 0 exposes as the following::
-
-		    $ ls /sys/devices/uncore_iio_0/die*
-		    -r--r--r-- /sys/devices/uncore_iio_0/die0
-		    -r--r--r-- /sys/devices/uncore_iio_0/die1
-		    -r--r--r-- /sys/devices/uncore_iio_0/die2
-		    -r--r--r-- /sys/devices/uncore_iio_0/die3
-
-		    $ tail /sys/devices/uncore_iio_0/die*
-		    ==> /sys/devices/uncore_iio_0/die0 <==
-		    0000:00
-		    ==> /sys/devices/uncore_iio_0/die1 <==
-		    0000:40
-		    ==> /sys/devices/uncore_iio_0/die2 <==
-		    0000:80
-		    ==> /sys/devices/uncore_iio_0/die3 <==
-		    0000:c0
-
-                Which means::
-
-		    IIO PMU 0 on die 0 belongs to PCI RP on bus 0x00, domain 0x0000
-		    IIO PMU 0 on die 1 belongs to PCI RP on bus 0x40, domain 0x0000
-		    IIO PMU 0 on die 2 belongs to PCI RP on bus 0x80, domain 0x0000
-		    IIO PMU 0 on die 3 belongs to PCI RP on bus 0xc0, domain 0x0000
-- 
2.7.4

Thanks,
Kan



^ permalink raw reply related	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 15:31           ` Greg KH
  2021-06-24 17:07             ` Liang, Kan
@ 2021-06-24 17:28             ` Andi Kleen
  2021-06-25  5:19               ` Greg KH
  1 sibling, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2021-06-24 17:28 UTC (permalink / raw)
  To: Greg KH
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


> But this is NOT how busses work in the driver model.
>
> PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
> a driver goes from being handled by "generic_pci_type_foo" to
> "vendor_foo".  Userspace can handle the change and life goes on.

In perf this is exposed to the user command line, and lots of people 
configure their own events, so it's very common in scripts. To use the 
pmu you have to use something like perf stat -a -e uncore_foo/.../. So 
it's not a single thing that could be patched up.

The perf tools supports PMUs abstractly and doesn't have special 
handling for every PMU. Also the perf design was always that the kernel 
should provide these abstractions with the user tool being (mostly) 
generic and abstract.


> So a device name will move from "generic" to "specific", right?
>
> Why does a bus have to do with any of this?

The perf pmus are in /sys/devices, so the symlinks for compatibility 
have to be created there.


> But a driver does not caer.  And if perf does not care, who cares?


The users who write scripts that specify the perf events on the perf 
command line.


-Andi



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 17:07             ` Liang, Kan
@ 2021-06-24 17:35               ` Greg KH
  2021-06-24 17:49                 ` Andi Kleen
  2021-06-25  5:17               ` Greg KH
  1 sibling, 1 reply; 33+ messages in thread
From: Greg KH @ 2021-06-24 17:35 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Andi Kleen, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Thu, Jun 24, 2021 at 01:07:14PM -0400, Liang, Kan wrote:
> 
> 
> On 6/24/2021 11:31 AM, Greg KH wrote:
> > On Thu, Jun 24, 2021 at 08:24:29AM -0700, Andi Kleen wrote:
> > > 
> > > On 6/24/2021 7:29 AM, Greg KH wrote:
> > > > On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
> > > > > > But first off, why is this symlink suddenly needed?  What is so special
> > > > > > about this new hardware that it breaks the existing model?
> > > > > The driver can be in two modes:
> > > > > 
> > > > > - Driver fully knows the hardware and puts in the correct Linux names
> > > > > 
> > > > > - Driver doesn't know the hardware but is in a fallback mode where it only
> > > > > looks at a discovery table. There we don't have the correct names, just an
> > > > > numeric identifier for the different hardware sub components.
> > > > Why does this matter?  Why would the driver not "know" the hardware?  If
> > > > it doesn't know it, why would it bind to it?
> > > 
> > > It's a similar concept as a PCI class. How to have a driver that can handle
> > > future hardware, but with some restrictions
> > 
> > But this is NOT how busses work in the driver model.
> > 
> > PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
> > a driver goes from being handled by "generic_pci_type_foo" to
> > "vendor_foo".  Userspace can handle the change and life goes on.
> > 
> > > The perf CPU PMU has had a similar concept for a long time. The driver can
> > > be either in architectural mode (with a subset of features), or be fully
> > > enabled. This allows users who are on an older kernel to still use at least
> > > a subset of the functionality.
> > 
> > So a device name will move from "generic" to "specific", right?
> 
> We'd like to keep both names.

That is not how sysfs and the driver model works, sorry.  You don't get
to keep both names, otherwise sysfs would be even more of a mess than it
currently is.  What happens if you need "another" name in the future?
When do you stop?

this isn't ok, please do it right.

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 17:35               ` Greg KH
@ 2021-06-24 17:49                 ` Andi Kleen
  2021-06-25  5:18                   ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2021-06-24 17:49 UTC (permalink / raw)
  To: Greg KH, Liang, Kan
  Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


> That is not how sysfs and the driver model works, sorry.  You don't get
> to keep both names, otherwise sysfs would be even more of a mess than it
> currently is.  What happens if you need "another" name in the future?
> When do you stop

I don't see any scenario where we would ever need more than a single 
symlink.

I believe there is already precedent for this elsewhere.

>
> this isn't ok, please do it right.

I don't see what exactly are you proposing.

Are you proposing to break every perf script on a kernel update? Doesn't 
seem acceptable to me.

Or move the compatibility into the perf tool? That would require the 
users to both update the perf tool and the kernel. I suppose it would be 
possible, but it would be totally against the standard perf abstraction 
design. Besides there are other consumers of this than just the perf 
tool, so it could possibly have a wide impact.

-Andi


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 17:07             ` Liang, Kan
  2021-06-24 17:35               ` Greg KH
@ 2021-06-25  5:17               ` Greg KH
  1 sibling, 0 replies; 33+ messages in thread
From: Greg KH @ 2021-06-25  5:17 UTC (permalink / raw)
  To: Liang, Kan
  Cc: Andi Kleen, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Thu, Jun 24, 2021 at 01:07:14PM -0400, Liang, Kan wrote:
> 
> 
> On 6/24/2021 11:31 AM, Greg KH wrote:
> > On Thu, Jun 24, 2021 at 08:24:29AM -0700, Andi Kleen wrote:
> > > 
> > > On 6/24/2021 7:29 AM, Greg KH wrote:
> > > > On Thu, Jun 24, 2021 at 07:24:31AM -0700, Andi Kleen wrote:
> > > > > > But first off, why is this symlink suddenly needed?  What is so special
> > > > > > about this new hardware that it breaks the existing model?
> > > > > The driver can be in two modes:
> > > > > 
> > > > > - Driver fully knows the hardware and puts in the correct Linux names
> > > > > 
> > > > > - Driver doesn't know the hardware but is in a fallback mode where it only
> > > > > looks at a discovery table. There we don't have the correct names, just an
> > > > > numeric identifier for the different hardware sub components.
> > > > Why does this matter?  Why would the driver not "know" the hardware?  If
> > > > it doesn't know it, why would it bind to it?
> > > 
> > > It's a similar concept as a PCI class. How to have a driver that can handle
> > > future hardware, but with some restrictions
> > 
> > But this is NOT how busses work in the driver model.
> > 
> > PCI classes are great, but we do NOT suddenly add a symlink in sysfs if
> > a driver goes from being handled by "generic_pci_type_foo" to
> > "vendor_foo".  Userspace can handle the change and life goes on.
> > 
> > > The perf CPU PMU has had a similar concept for a long time. The driver can
> > > be either in architectural mode (with a subset of features), or be fully
> > > enabled. This allows users who are on an older kernel to still use at least
> > > a subset of the functionality.
> > 
> > So a device name will move from "generic" to "specific", right?
> 
> We'd like to keep both names.

That is not how the driver model works, sorry.

Stick with the first name you got (i.e. the one in the kernel now).  Do
not try to use symlinks for things that should not be symlinked.

If a device changes names, wonderful, deal with that in userspace, we do
so all the time in tools for busses because device names are never
guaranteed to be the same each boot.  But bus names do not change for
the obvious reason that bus names are not dynamic, you pick them at
build time.

If you have userspace code that relies on device names to be static and
never change, then you need to stick with that as that was your choice
when you created that api.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 17:49                 ` Andi Kleen
@ 2021-06-25  5:18                   ` Greg KH
  0 siblings, 0 replies; 33+ messages in thread
From: Greg KH @ 2021-06-25  5:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Liang, Kan, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Thu, Jun 24, 2021 at 10:49:42AM -0700, Andi Kleen wrote:
> Are you proposing to break every perf script on a kernel update? Doesn't
> seem acceptable to me.

I'm suggesting that you not submit kernel code that breaks the device
names of things that you previously stated would never change.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-24 17:28             ` Andi Kleen
@ 2021-06-25  5:19               ` Greg KH
  2021-06-25 14:22                 ` Andi Kleen
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2021-06-25  5:19 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Thu, Jun 24, 2021 at 10:28:36AM -0700, Andi Kleen wrote:
> > But a driver does not caer.  And if perf does not care, who cares?
> 
> The users who write scripts that specify the perf events on the perf command
> line.

Great, then as you have deemed the device name part of your documented
api, then keep it stable as that is what you decided to do a long time
ago when it was created.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-25  5:19               ` Greg KH
@ 2021-06-25 14:22                 ` Andi Kleen
  2021-06-25 14:38                   ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2021-06-25 14:22 UTC (permalink / raw)
  To: Greg KH
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


On 6/24/2021 10:19 PM, Greg KH wrote:
> On Thu, Jun 24, 2021 at 10:28:36AM -0700, Andi Kleen wrote:
>>> But a driver does not caer.  And if perf does not care, who cares?
>> The users who write scripts that specify the perf events on the perf command
>> line.
> Great, then as you have deemed the device name part of your documented
> api, then keep it stable as that is what you decided to do a long time
> ago when it was created.

The fully supported driver keeps it stable, but the driver in fallback 
mode (as in running on a yet unknown system) can't because it doesn't 
have enough information. It has to fall back to the numeric identifiers.

Supporting yet unknown systems is a big advantage, missing full kernel 
support is the number one reason people can't use uncore monitoring today.

The symlink keeps some degree of compatibility between the two.

Anyways thinking about it if Greg doesn't want symlinks (even though 
sysfs already has symlinks elsewhere), maybe we could just create two 
devices without symlinks. Kan, do you think that would work?

-Andi


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-25 14:22                 ` Andi Kleen
@ 2021-06-25 14:38                   ` Greg KH
  2021-06-25 14:49                     ` Andi Kleen
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2021-06-25 14:38 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Fri, Jun 25, 2021 at 07:22:08AM -0700, Andi Kleen wrote:
> 
> On 6/24/2021 10:19 PM, Greg KH wrote:
> > On Thu, Jun 24, 2021 at 10:28:36AM -0700, Andi Kleen wrote:
> > > > But a driver does not caer.  And if perf does not care, who cares?
> > > The users who write scripts that specify the perf events on the perf command
> > > line.
> > Great, then as you have deemed the device name part of your documented
> > api, then keep it stable as that is what you decided to do a long time
> > ago when it was created.
> 
> The fully supported driver keeps it stable, but the driver in fallback mode
> (as in running on a yet unknown system) can't because it doesn't have enough
> information. It has to fall back to the numeric identifiers.
> 
> Supporting yet unknown systems is a big advantage, missing full kernel
> support is the number one reason people can't use uncore monitoring today.
> 
> The symlink keeps some degree of compatibility between the two.

But it creates something that is not needed at the moment, and then
userspace will rely on it.  Don't make userspace rely on it today and
all should be fine.

Device names will change, that's always a given, as the kernel can never
always make them the same.  That's why userspace needs to scan the bus
for all devices and then pick out the one that it wants to look at.
Don't hard-encode device names into userspace tools, that way lies
madness.

> Anyways thinking about it if Greg doesn't want symlinks (even though sysfs
> already has symlinks elsewhere), maybe we could just create two devices
> without symlinks. Kan, do you think that would work?

Do not have 2 different structures represent the same hardware device,
that too is a shortcut to madness.

What prevents userspace from handling device names changing today?  Why
are you forcing userspace to pick a specific device name at all?

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-25 14:38                   ` Greg KH
@ 2021-06-25 14:49                     ` Andi Kleen
  2021-06-25 15:03                       ` Liang, Kan
  2021-06-27 11:02                       ` Greg KH
  0 siblings, 2 replies; 33+ messages in thread
From: Andi Kleen @ 2021-06-25 14:49 UTC (permalink / raw)
  To: Greg KH
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


> Device names will change, that's always a given, as the kernel can never
> always make them the same.  That's why userspace needs to scan the bus
> for all devices and then pick out the one that it wants to look at.

In perf the tool doesn't normally know what devices (= pmu) the users 
want to look at. It's all specified on the command line depending on 
what events you want to measure. There's no way for the tool to figure 
that out on its own.


> Don't hard-encode device names into userspace tools, that way lies
> madness.

There's no hard coding in the tools (or at least not for the non json 
event list case). It all comes from the command line. But that is where 
the problem comes from.

>
>> Anyways thinking about it if Greg doesn't want symlinks (even though sysfs
>> already has symlinks elsewhere), maybe we could just create two devices
>> without symlinks. Kan, do you think that would work?
> Do not have 2 different structures represent the same hardware device,
> that too is a shortcut to madness.
>
> What prevents userspace from handling device names changing today?  Why
> are you forcing userspace to pick a specific device name at all?

The way the perf tool works is that you have to specify the names on the 
command line:

perf stat -a -e uncore_cha/event=1/ ...

With the numeric identifiers it would be

perf stat -a -e uncore_type_X_Y/event=1/

The tool handles it all abstractly.

So yes the user tools itself can handle it. But the problem is that it 
is directly exposed to the users, so the users would need to change all 
their scripts when switching between the two cases. That is what we're 
trying to avoid -- provide them a way that works on both.

-Andi



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-25 14:49                     ` Andi Kleen
@ 2021-06-25 15:03                       ` Liang, Kan
  2021-06-25 15:44                         ` Andi Kleen
  2021-06-27 11:02                       ` Greg KH
  1 sibling, 1 reply; 33+ messages in thread
From: Liang, Kan @ 2021-06-25 15:03 UTC (permalink / raw)
  To: Andi Kleen, Greg KH
  Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa



On 6/25/2021 10:49 AM, Andi Kleen wrote:
> 
>> Device names will change, that's always a given, as the kernel can never
>> always make them the same.  That's why userspace needs to scan the bus
>> for all devices and then pick out the one that it wants to look at.
> 
> In perf the tool doesn't normally know what devices (= pmu) the users 
> want to look at. It's all specified on the command line depending on 
> what events you want to measure. There's no way for the tool to figure 
> that out on its own.
> 
> 
>> Don't hard-encode device names into userspace tools, that way lies
>> madness.
> 
> There's no hard coding in the tools (or at least not for the non json 
> event list case). It all comes from the command line. But that is where 
> the problem comes from.
> 
>>
>>> Anyways thinking about it if Greg doesn't want symlinks (even though 
>>> sysfs
>>> already has symlinks elsewhere), maybe we could just create two devices
>>> without symlinks. Kan, do you think that would work?
>> Do not have 2 different structures represent the same hardware device,
>> that too is a shortcut to madness.
>>
>> What prevents userspace from handling device names changing today?  Why
>> are you forcing userspace to pick a specific device name at all?
> 
> The way the perf tool works is that you have to specify the names on the 
> command line:
> 
> perf stat -a -e uncore_cha/event=1/ ...
> 
> With the numeric identifiers it would be
> 
> perf stat -a -e uncore_type_X_Y/event=1/
> 
> The tool handles it all abstractly.
> 
> So yes the user tools itself can handle it. But the problem is that it 
> is directly exposed to the users, so the users would need to change all 
> their scripts when switching between the two cases. That is what we're 
> trying to avoid -- provide them a way that works on both.
> 

We have an attribute "caps/pmu_name" for the core PMU. Maybe we should 
add it for uncore PMU as well. For example,

$ cat /sys/devices/uncore_type_0_0/caps/pmu_name
cha_0

Userspace tool can get clues about what type_0_0 is.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-25 15:03                       ` Liang, Kan
@ 2021-06-25 15:44                         ` Andi Kleen
  2021-06-25 15:57                           ` Liang, Kan
  0 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2021-06-25 15:44 UTC (permalink / raw)
  To: Liang, Kan, Greg KH
  Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


> We have an attribute "caps/pmu_name" for the core PMU. Maybe we should 
> add it for uncore PMU as well. For example,
>
> $ cat /sys/devices/uncore_type_0_0/caps/pmu_name
> cha_0
>
> Userspace tool can get clues about what type_0_0 is.

It would break all the old tools, but I suppose it could work for 
updated tools.

It isn't only perf that is parsing this, but some other libraries too. 
They all would need to be updated too.

I think I still prefer the symlink. It would just work and keep full 
compatibility.

But yes maybe there is no choice.

-Andi



^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-25 15:44                         ` Andi Kleen
@ 2021-06-25 15:57                           ` Liang, Kan
  2021-06-25 16:18                             ` Liang, Kan
  0 siblings, 1 reply; 33+ messages in thread
From: Liang, Kan @ 2021-06-25 15:57 UTC (permalink / raw)
  To: Andi Kleen, Greg KH
  Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa



On 6/25/2021 11:44 AM, Andi Kleen wrote:
> 
>> We have an attribute "caps/pmu_name" for the core PMU. Maybe we should 
>> add it for uncore PMU as well. For example,
>>
>> $ cat /sys/devices/uncore_type_0_0/caps/pmu_name
>> cha_0
>>
>> Userspace tool can get clues about what type_0_0 is.
> 
> It would break all the old tools, but I suppose it could work for 
> updated tools.
> 

Right, users have to update their perf tool to use the new name, 
uncore_cha_0.

> It isn't only perf that is parsing this, but some other libraries too. 
> They all would need to be updated too.
>

Yes, I will update the Documentation/ABI/ in V2 to indicate the changes. 
If any other libraries are broken, they may get clues from the Document.

Thanks,
Kan

> I think I still prefer the symlink. It would just work and keep full 
> compatibility.
> 
> But yes maybe there is no choice.
> 
> -Andi
> 
> 

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-25 15:57                           ` Liang, Kan
@ 2021-06-25 16:18                             ` Liang, Kan
  0 siblings, 0 replies; 33+ messages in thread
From: Liang, Kan @ 2021-06-25 16:18 UTC (permalink / raw)
  To: Andi Kleen, Greg KH
  Cc: peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa



On 6/25/2021 11:57 AM, Liang, Kan wrote:
> 
> 
> On 6/25/2021 11:44 AM, Andi Kleen wrote:
>>
>>> We have an attribute "caps/pmu_name" for the core PMU. Maybe we 
>>> should add it for uncore PMU as well. For example,
>>>
>>> $ cat /sys/devices/uncore_type_0_0/caps/pmu_name
>>> cha_0
>>>
>>> Userspace tool can get clues about what type_0_0 is.
>>
>> It would break all the old tools, but I suppose it could work for 
>> updated tools.
>>
> 
> Right, users have to update their perf tool to use the new name, 
> uncore_cha_0.

I think the above example is misleading. Let me rephrase.
Here is what I'm planing to do in V2.

With the V2 platform-specific patch, uncore driver will only create a 
meaningful uncore name, e.g., uncore_cha_0.

An attribute "caps/pmu_name" is also created to indicate the previous 
name. For example,

$ cat /sys/devices/uncore_cha_0/caps/pmu_name
type_0_0

If any users use the old numeric name, they have to update either their 
script or a perf tool which supports "caps/pmu_name".

In the future, if the users already use a perf tool which supports 
"caps/pmu_name", nothing needs to be updated. The old numeric name 
should just work.

Thanks,
Kan

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-25 14:49                     ` Andi Kleen
  2021-06-25 15:03                       ` Liang, Kan
@ 2021-06-27 11:02                       ` Greg KH
  2021-06-27 16:30                         ` Andi Kleen
  1 sibling, 1 reply; 33+ messages in thread
From: Greg KH @ 2021-06-27 11:02 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Fri, Jun 25, 2021 at 07:49:36AM -0700, Andi Kleen wrote:
> 
> > Device names will change, that's always a given, as the kernel can never
> > always make them the same.  That's why userspace needs to scan the bus
> > for all devices and then pick out the one that it wants to look at.
> 
> In perf the tool doesn't normally know what devices (= pmu) the users want
> to look at. It's all specified on the command line depending on what events
> you want to measure. There's no way for the tool to figure that out on its
> own.
> 
> 
> > Don't hard-encode device names into userspace tools, that way lies
> > madness.
> 
> There's no hard coding in the tools (or at least not for the non json event
> list case). It all comes from the command line. But that is where the
> problem comes from.

Then do not break things by renaming the device name, as you all have
now stated that this name is part of the user/kernel api.

But really, I do not see why this is an issue, why isn't userspace just
properly walking the list of devices and picking the one on this
specific system that they want to look at?

> > > Anyways thinking about it if Greg doesn't want symlinks (even though sysfs
> > > already has symlinks elsewhere), maybe we could just create two devices
> > > without symlinks. Kan, do you think that would work?
> > Do not have 2 different structures represent the same hardware device,
> > that too is a shortcut to madness.
> > 
> > What prevents userspace from handling device names changing today?  Why
> > are you forcing userspace to pick a specific device name at all?
> 
> The way the perf tool works is that you have to specify the names on the
> command line:
> 
> perf stat -a -e uncore_cha/event=1/ ...
> 
> With the numeric identifiers it would be
> 
> perf stat -a -e uncore_type_X_Y/event=1/
> 
> The tool handles it all abstractly.

Great, and that device name is something that is unique per machine.
And per boot.  So why are you suddenly thinking that this name has to be
"stable"?

If you think it does have to be stable, that was your choice, so now you
must keep it stable.  Don't try to mess with symlinks and the like
please, as again, that way lies madness and unmaintainability for the
next 20+ years.

> So yes the user tools itself can handle it. But the problem is that it is
> directly exposed to the users, so the users would need to change all their
> scripts when switching between the two cases. That is what we're trying to
> avoid -- provide them a way that works on both.

But these are different systems!  Why would anyone expect that the
device name is the same on different systems?  If you insist on keeping
the name identical for newer kernel versions, then again, that was your
choice and now you have to do that.  Do not try to work around your own
requirement by using a symlink.

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-27 11:02                       ` Greg KH
@ 2021-06-27 16:30                         ` Andi Kleen
  2021-06-28  6:55                           ` Greg KH
  0 siblings, 1 reply; 33+ messages in thread
From: Andi Kleen @ 2021-06-27 16:30 UTC (permalink / raw)
  To: Greg KH
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


> Then do not break things by renaming the device name, as you all have
> now stated that this name is part of the user/kernel api.

The renaming comes from the fallback mode on future systems. In the 
fallback mode the driver doesn't know the true name, so it has to use  
the numeric name. If you don't use the fallback mode and have the full 
driver then yes you'll get the same names as always (or at least as they 
make sense for the hardware).

But we would like to have the fallback mode too to allow more people use 
uncore monitoring, and that's where the need to for the second name 
comes in.

>
> But really, I do not see why this is an issue, why isn't userspace just
> properly walking the list of devices and picking the one on this
> specific system that they want to look at?

perf is not an fully automated tool that knows what it wants to look at. 
It's not like udev etc.

It's a tool to let the user specify what they want to measure on the 
command line. And that specification is through the pmu name.

Of course it walks the list to find the name, but it can only chose 
based on the name the user specified.

It's like the ftrace tool doesn't know what trace points to measure on 
its own, it just knows what is specified on the command line. Or the ip 
tool doesn't know on its own what network device names to use for some 
network configuration, it has to get the information from the command line.


>
>>>> Anyways thinking about it if Greg doesn't want symlinks (even though sysfs
>>>> already has symlinks elsewhere), maybe we could just create two devices
>>>> without symlinks. Kan, do you think that would work?
>>> Do not have 2 different structures represent the same hardware device,
>>> that too is a shortcut to madness.
>>>
>>> What prevents userspace from handling device names changing today?  Why
>>> are you forcing userspace to pick a specific device name at all?
>> The way the perf tool works is that you have to specify the names on the
>> command line:
>>
>> perf stat -a -e uncore_cha/event=1/ ...
>>
>> With the numeric identifiers it would be
>>
>> perf stat -a -e uncore_type_X_Y/event=1/
>>
>> The tool handles it all abstractly.
> Great, and that device name is something that is unique per machine.
> And per boot.

No it's not unique and per boot. It's always the same on a given 
platform, it's specified by firmware. I would expect the names to be 
stable over all systems of a given chip.


>   So why are you suddenly thinking that this name has to be
> "stable"?
It's about as stable as the existing names. The existing names change 
sometimes too when the hardware changes (for example before Skylake we 
had "uncore_cbo", since Skylake there is "uncore_cha"). The numeric 
identifier should have similar stability (doesn't change as long as the 
hardware doesn't change significantly)


>
> If you think it does have to be stable, that was your choice, so now you
> must keep it stable.  Don't try to mess with symlinks and the like
> please, as again, that way lies madness and unmaintainability for the
> next 20+ years.

We keep it as stable as possible, but in the fallback mode only the 
numeric IDs are possible. In the "driver knows full hardware" mode it 
keeps the existing names, as possible.


>
>> So yes the user tools itself can handle it. But the problem is that it is
>> directly exposed to the users, so the users would need to change all their
>> scripts when switching between the two cases. That is what we're trying to
>> avoid -- provide them a way that works on both.
> But these are different systems!  Why would anyone expect that the
> device name is the same on different systems?

The scenario is that you run the same system but with two different kernels.

Kernel 1 doesn't know the model number and can only operate in fallback 
mode:

It only shows numeric IDs and that's what you have to use

Kernel 2 knows the model number and has a full driver which supports the 
full Linux standard naming.

You can use the standard names (like uncore_cha)

But the problem is that it would be impossible to write a script that 
works on both Kernel 1 and Kernel 2 on the same system without the 
symlink or equivalent.


>   If you insist on keeping
> the name identical for newer kernel versions, then again, that was your
> choice and now you have to do that.  Do not try to work around your own
> requirement by using a symlink.

Are you suggesting to only use numerical names everywhere?

That would be a big change for existing users. The idea was that people 
who use the fully enabled driver can use the much nicer symbolic names. 
But people who care about writing scripts that work everywhere can use 
the more difficult to use numeric names.

Anyways it looks like we're setting on using the "name" method suggested 
by Kan. I must say I'm not a big fan of it though, it's just another 
incompatible break in the perf ABI that would be totally avoidable.

-Andi


^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-27 16:30                         ` Andi Kleen
@ 2021-06-28  6:55                           ` Greg KH
  2021-06-28 15:00                             ` Andi Kleen
  0 siblings, 1 reply; 33+ messages in thread
From: Greg KH @ 2021-06-28  6:55 UTC (permalink / raw)
  To: Andi Kleen
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa

On Sun, Jun 27, 2021 at 09:30:53AM -0700, Andi Kleen wrote:
> 
> > Then do not break things by renaming the device name, as you all have
> > now stated that this name is part of the user/kernel api.
> 
> The renaming comes from the fallback mode on future systems. In the fallback
> mode the driver doesn't know the true name, so it has to use  the numeric
> name. If you don't use the fallback mode and have the full driver then yes
> you'll get the same names as always (or at least as they make sense for the
> hardware).
> 
> But we would like to have the fallback mode too to allow more people use
> uncore monitoring, and that's where the need to for the second name comes
> in.

So then just always use the "fallback" name if that is going to be the
name you have for this hardware device.  Why would you want it to be
renamed later on to a "fancier" name if there is only going to be
one-per-chipset-type anyway?

Naming is hard, make it simple and do not change it if your userspace
tools are not going to be able to handle the issue.  Do NOT paper over
your naming scheme with symlinks from the very beginning, as that shows
the naming scheme is flawed.

thanks,

greg k-h

^ permalink raw reply	[flat|nested] 33+ messages in thread

* Re: [PATCH 2/7] perf: Create a symlink for a PMU
  2021-06-28  6:55                           ` Greg KH
@ 2021-06-28 15:00                             ` Andi Kleen
  0 siblings, 0 replies; 33+ messages in thread
From: Andi Kleen @ 2021-06-28 15:00 UTC (permalink / raw)
  To: Greg KH
  Cc: kan.liang, peterz, mingo, linux-kernel, eranian, namhyung, acme, jolsa


On 6/27/2021 11:55 PM, Greg KH wrote:
> On Sun, Jun 27, 2021 at 09:30:53AM -0700, Andi Kleen wrote:
>>> Then do not break things by renaming the device name, as you all have
>>> now stated that this name is part of the user/kernel api.
>> The renaming comes from the fallback mode on future systems. In the fallback
>> mode the driver doesn't know the true name, so it has to use  the numeric
>> name. If you don't use the fallback mode and have the full driver then yes
>> you'll get the same names as always (or at least as they make sense for the
>> hardware).
>>
>> But we would like to have the fallback mode too to allow more people use
>> uncore monitoring, and that's where the need to for the second name comes
>> in.
> So then just always use the "fallback" name if that is going to be the
> name you have for this hardware device.  Why would you want it to be
> renamed later on to a "fancier" name if there is only going to be
> one-per-chipset-type anyway?

It's an ugly numeric name, difficult to use

perf stat -e uncore_0_2//

instead of

perf stat -e uncore_cha//

It wouldn't exactly be an improvement for the full driver.

-Andi


^ permalink raw reply	[flat|nested] 33+ messages in thread

end of thread, other threads:[~2021-06-28 15:33 UTC | newest]

Thread overview: 33+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-06-24  1:22 [PATCH 0/7] perf: Add Sapphire Rapids server uncore support kan.liang
2021-06-24  1:22 ` [PATCH 1/7] driver core: Add a way to get to bus devices kset kan.liang
2021-06-24  5:41   ` Greg KH
2021-06-24  1:22 ` [PATCH 2/7] perf: Create a symlink for a PMU kan.liang
2021-06-24  5:48   ` Greg KH
2021-06-24 14:24     ` Andi Kleen
2021-06-24 14:29       ` Greg KH
2021-06-24 15:24         ` Andi Kleen
2021-06-24 15:31           ` Greg KH
2021-06-24 17:07             ` Liang, Kan
2021-06-24 17:35               ` Greg KH
2021-06-24 17:49                 ` Andi Kleen
2021-06-25  5:18                   ` Greg KH
2021-06-25  5:17               ` Greg KH
2021-06-24 17:28             ` Andi Kleen
2021-06-25  5:19               ` Greg KH
2021-06-25 14:22                 ` Andi Kleen
2021-06-25 14:38                   ` Greg KH
2021-06-25 14:49                     ` Andi Kleen
2021-06-25 15:03                       ` Liang, Kan
2021-06-25 15:44                         ` Andi Kleen
2021-06-25 15:57                           ` Liang, Kan
2021-06-25 16:18                             ` Liang, Kan
2021-06-27 11:02                       ` Greg KH
2021-06-27 16:30                         ` Andi Kleen
2021-06-28  6:55                           ` Greg KH
2021-06-28 15:00                             ` Andi Kleen
2021-06-24  1:22 ` [PATCH 3/7] perf/x86/intel/uncore: Create a symlink for an uncore PMU kan.liang
2021-06-24  5:44   ` Greg KH
2021-06-24  1:22 ` [PATCH 4/7] perf/x86/intel/uncore: Add Sapphire Rapids server support kan.liang
2021-06-24  1:22 ` [PATCH 5/7] perf/x86/intel/uncore: Factor out snr_uncore_mmio_map() kan.liang
2021-06-24  1:22 ` [PATCH 6/7] perf/x86/intel/uncore: Support free-running counters on Sapphire Rapids server kan.liang
2021-06-24  1:22 ` [PATCH 7/7] perf/x86/intel/uncore: Fix invalid unit check kan.liang

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).