linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
@ 2023-06-06  7:49 Shuai Xue
  2023-06-06  7:49 ` [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
                   ` (4 more replies)
  0 siblings, 5 replies; 31+ messages in thread
From: Shuai Xue @ 2023-06-06  7:49 UTC (permalink / raw)
  To: chengyou, kaishen, helgaas, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song, xueshuai

changes since v5:
- Rewrite the commit log to follow policy in pci_ids.h (Bjorn Helgaas)
- return error code when __dwc_pcie_pmu_probe failed (Baolin Wang)
- call 'cpuhp_remove_multi_state()' when exiting the driver. (Baolin Wang)
- pick up Review-by tag from Baolin for Patch 1 and 3

changes since v4:

1. addressing commens from Bjorn Helgaas:
- reorder the includes by alpha
- change all macros with upper-case hex
- change ras_des type into u16
- remove unnecessary outer "()"
- minor format changes

2. Address commensts from Jonathan Cameron:
- rewrite doc and add a example to show how to use lane event

3. fix compile error reported by: kernel test robot
- remove COMPILE_TEST and add depend on PCI in kconfig
- add Reported-by: kernel test robot <lkp@intel.com>

Changes since v3:

1. addressing comments from Robin Murphy:
- add a prepare patch to define pci id in linux/pci_ids.h
- remove unnecessary 64BIT dependency
- fix DWC_PCIE_PER_EVENT_OFF/ON macro
- remove dwc_pcie_pmu struct and move all its fileds into dwc_pcie_rp_info
- remove unnecessary format field show
- use sysfs_emit() instead of all the assorted sprintf() and snprintf() calls.
- remove unnecessary spaces and remove unnecessary cast to follow event show convention
- remove pcie_pmu_event_attr_is_visible
- fix a refcout leak on error branch when walk pci device in for_each_pci_dev
- remove bdf field from dwc_pcie_rp_info and calculate it at runtime
- finish all the checks before allocating rp_info to avoid hanging wasted memory
- remove some unused fields
- warp out control register configuration from sub function to .add()
- make function return type with a proper signature
- fix lane event count enable by clear DWC_PCIE_CNT_ENABLE field first
- pass rp_info directly to the read_*_counter helpers and in start, stop and add callbacks
- move event type validtion into .event_init()
- use is_sampling_event() to be consistent with everything else of pmu drivers
- remove unnecessary dev_err message in .event_init()
- return EINVAL instead EOPNOTSUPP for not a valid event 
- finish all the checks before start modifying the event
- fix sibling event check by comparing event->pmu with sibling->pmu
- probe PMU for each rootport independently
- use .update() as .read() directly
- remove dynamically generating symbolic name of lane event
- redefine static symbolic name of lane event and leave lane filed to user
- add CPU hotplug support

2. addressing comments from Baolin:
- add a mask to avoid possible overflow

Changes since v2 addressing comments from Baolin:
- remove redundant macro definitions
- use dev_err to print error message
- change pmu_is_register to boolean
- use PLATFORM_DEVID_NONE macro
- fix module author format

Changes since v1:

1. address comments from Jonathan:
- drop marco for PMU name and VSEC version
- simplify code with PCI standard marco
- simplify code with FIELD_PREP()/FIELD_GET() to replace shift marco
- name register filed with single _ instead double
- wrap dwc_pcie_pmu_{write}_dword out and drop meaningless snaity check 
- check vendor id while matching vesc with pci_find_vsec_capability()
- remove RP_NUM_MAX and use a list to organize PMU devices for rootports
- replace DWC_PCIE_CREATE_BDF with standard PCI_DEVID
- comments on riping register together

2. address comments from Bjorn:
- rename DWC_PCIE_VSEC_ID to DWC_PCIE_VSEC_RAS_DES_ID
- rename cap_pos to ras_des
- simplify declare of device_attribute with DEVICE_ATTR_RO
- simplify code with PCI standard macro and API like pcie_get_width_cap()
- fix some code style problem and typo
- drop meaningless snaity check of container_of

3. address comments from Yicong:
- use sysfs_emit() to replace sprintf()
- simplify iteration of pci device with for_each_pci_dev
- pick preferred CPUs on a near die and add comments
- unregister PMU drivers only for failed ones
- log on behalf PMU device and give more hint
- fix some code style problem

(Thanks for all comments and they are very valuable to me)

This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
Core controller IP which provides statistics feature.

Shuai Xue (4):
  docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  PCI: Add Alibaba Vendor ID to linux/pci_ids.h
  drivers/perf: add DesignWare PCIe PMU driver
  MAINTAINERS: add maintainers for DesignWare PCIe PMU driver

 .../admin-guide/perf/dwc_pcie_pmu.rst         |  97 +++
 Documentation/admin-guide/perf/index.rst      |   1 +
 MAINTAINERS                                   |   6 +
 drivers/infiniband/hw/erdma/erdma_hw.h        |   2 -
 drivers/perf/Kconfig                          |   7 +
 drivers/perf/Makefile                         |   1 +
 drivers/perf/dwc_pcie_pmu.c                   | 706 ++++++++++++++++++
 include/linux/pci_ids.h                       |   2 +
 8 files changed, 820 insertions(+), 2 deletions(-)
 create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
 create mode 100644 drivers/perf/dwc_pcie_pmu.c

-- 
2.20.1.12.g72788fdb


^ permalink raw reply	[flat|nested] 31+ messages in thread

* [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  2023-06-06  7:49 [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
@ 2023-06-06  7:49 ` Shuai Xue
  2023-07-27  8:57   ` Jonathan Cameron
  2023-06-06  7:49 ` [PATCH v6 2/4] PCI: Add Alibaba Vendor ID to linux/pci_ids.h Shuai Xue
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-06-06  7:49 UTC (permalink / raw)
  To: chengyou, kaishen, helgaas, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song, xueshuai

Alibaba's T-Head Yitan 710 SoC includes Synopsys' DesignWare Core PCIe
controller which implements which implements PMU for performance and
functional debugging to facilitate system maintenance.

Document it to provide guidance on how to use it.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 .../admin-guide/perf/dwc_pcie_pmu.rst         | 97 +++++++++++++++++++
 Documentation/admin-guide/perf/index.rst      |  1 +
 2 files changed, 98 insertions(+)
 create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst

diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
new file mode 100644
index 000000000000..c1f671cb64ec
--- /dev/null
+++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
@@ -0,0 +1,97 @@
+======================================================================
+Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
+======================================================================
+
+DesignWare Cores (DWC) PCIe PMU
+===============================
+
+The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
+only PCIe configuration space register block provided by each PCIe Root
+Port in a Vendor-Specific Extended Capability named RAS DES (Debug, Error
+injection, and Statistics).
+
+As the name indicated, the RAS DES capability supports system level
+debugging, AER error injection, and collection of statistics. To facilitate
+collection of statistics, Synopsys DesignWare Cores PCIe controller
+provides the following two features:
+
+- Time Based Analysis (RX/TX data throughput and time spent in each
+  low-power LTSSM state)
+- Lane Event counters (Error and Non-Error for lanes)
+
+Time Based Analysis
+-------------------
+
+Using this feature you can obtain information regarding RX/TX data
+throughput and time spent in each low-power LTSSM state by the controller.
+
+The counters are 64-bit width and measure data in two categories,
+
+- percentage of time does the controller stay in LTSSM state in a
+  configurable duration. The measurement range of each Event in Group#0.
+- amount of data processed (Units of 16 bytes). The measurement range of
+  each Event in Group#1.
+
+Lane Event counters
+-------------------
+
+Using this feature you can obtain Error and Non-Error information in
+specific lane by the controller.
+
+The counters are 32-bit width and the measured event is select by:
+
+- Group i
+- Event j within the Group i
+- and Lane k
+
+Some of the event counters only exist for specific configurations.
+
+DesignWare Cores (DWC) PCIe PMU Driver
+=======================================
+
+This driver add PMU devices for each PCIe Root Port. And the PMU device is
+named based the BDF of Root Port. For example,
+
+    30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
+
+the PMU device name for this Root Port is dwc_rootport_3018.
+
+The DWC PCIe PMU driver registers a perf PMU driver, which provides
+description of available events and configuration options in sysfs, see
+/sys/bus/event_source/devices/dwc_rootport_{bdf}.
+
+The "format" directory describes format of the config, fields of the
+perf_event_attr structure. The "events" directory provides configuration
+templates for all documented events.  For example,
+"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
+
+The "perf list" command shall list the available events from sysfs, e.g.::
+
+    $# perf list | grep dwc_rootport
+    <...>
+    dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/        [Kernel PMU event]
+    <...>
+    dwc_rootport_3018/rx_memory_read,lane=?/               [Kernel PMU event]
+
+Time Based Analysis Event Usage
+-------------------------------
+
+Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
+
+    $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
+
+The average RX/TX bandwidth can be calculated using the following formula:
+
+    PCIe RX Bandwidth = PCIE_RX_DATA * 16B / Measure_Time_Window
+    PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
+
+Lane Event Usage
+-------------------------------
+
+Each lane has the same event set and to avoid generating a list of hundreds
+of events, the user need to specify the lane ID explicitly, e.g.::
+
+    $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/
+
+The driver does not support sampling, therefore "perf record" will not
+work. Per-task (without "-a") perf sessions are not supported.
diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
index 9de64a40adab..11a80cd28a2e 100644
--- a/Documentation/admin-guide/perf/index.rst
+++ b/Documentation/admin-guide/perf/index.rst
@@ -19,5 +19,6 @@ Performance monitor support
    arm_dsu_pmu
    thunderx2-pmu
    alibaba_pmu
+   dwc_pcie_pmu
    nvidia-pmu
    meson-ddr-pmu
-- 
2.20.1.12.g72788fdb


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 2/4] PCI: Add Alibaba Vendor ID to linux/pci_ids.h
  2023-06-06  7:49 [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
  2023-06-06  7:49 ` [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
@ 2023-06-06  7:49 ` Shuai Xue
  2023-06-06 15:31   ` Bjorn Helgaas
  2023-06-06  7:49 ` [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver Shuai Xue
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-06-06  7:49 UTC (permalink / raw)
  To: chengyou, kaishen, helgaas, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song, xueshuai

The Alibaba Vendor ID (0x1ded) is now used by Alibaba elasticRDMA ("erdma")
and will be shared with the upcoming PCIe PMU ("dwc_pcie_pmu"). Move the
Vendor ID to linux/pci_ids.h so that it can shared by several drivers
later.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 drivers/infiniband/hw/erdma/erdma_hw.h | 2 --
 include/linux/pci_ids.h                | 2 ++
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/infiniband/hw/erdma/erdma_hw.h b/drivers/infiniband/hw/erdma/erdma_hw.h
index 76ce2856be28..ee35ebef9ee7 100644
--- a/drivers/infiniband/hw/erdma/erdma_hw.h
+++ b/drivers/infiniband/hw/erdma/erdma_hw.h
@@ -11,8 +11,6 @@
 #include <linux/types.h>
 
 /* PCIe device related definition. */
-#define PCI_VENDOR_ID_ALIBABA 0x1ded
-
 #define ERDMA_PCI_WIDTH 64
 #define ERDMA_FUNC_BAR 0
 #define ERDMA_MISX_BAR 2
diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
index 95f33dadb2be..9e8aec472f06 100644
--- a/include/linux/pci_ids.h
+++ b/include/linux/pci_ids.h
@@ -2586,6 +2586,8 @@
 #define PCI_VENDOR_ID_TEKRAM		0x1de1
 #define PCI_DEVICE_ID_TEKRAM_DC290	0xdc29
 
+#define PCI_VENDOR_ID_ALIBABA		0x1ded
+
 #define PCI_VENDOR_ID_TEHUTI		0x1fc9
 #define PCI_DEVICE_ID_TEHUTI_3009	0x3009
 #define PCI_DEVICE_ID_TEHUTI_3010	0x3010
-- 
2.20.1.12.g72788fdb


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-06-06  7:49 [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
  2023-06-06  7:49 ` [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
  2023-06-06  7:49 ` [PATCH v6 2/4] PCI: Add Alibaba Vendor ID to linux/pci_ids.h Shuai Xue
@ 2023-06-06  7:49 ` Shuai Xue
  2023-06-06 15:14   ` Yicong Yang
  2023-06-06  7:49 ` [PATCH v6 4/4] MAINTAINERS: add maintainers for " Shuai Xue
  2023-06-16  8:39 ` [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
  4 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-06-06  7:49 UTC (permalink / raw)
  To: chengyou, kaishen, helgaas, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song, xueshuai

This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
Core controller IP which provides statistics feature. The PMU is not a PCIe
Root Complex integrated End Point(RCiEP) device but only register counters
provided by each PCIe Root Port.

To facilitate collection of statistics the controller provides the
following two features for each Root Port:

- Time Based Analysis (RX/TX data throughput and time spent in each
  low-power LTSSM state)
- Event counters (Error and Non-Error for lanes)

Note, only one counter for each type and does not overflow interrupt.

This driver adds PMU devices for each PCIe Root Port. And the PMU device is
named based the BDF of Root Port. For example,

    30:03.0 PCI bridge: Device 1ded:8000 (rev 01)

the PMU device name for this Root Port is dwc_rootport_3018.

Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::

    $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/

average RX bandwidth can be calculated like this:

    PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
Reported-by: kernel test robot <lkp@intel.com>
Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 drivers/perf/Kconfig        |   7 +
 drivers/perf/Makefile       |   1 +
 drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
 3 files changed, 714 insertions(+)
 create mode 100644 drivers/perf/dwc_pcie_pmu.c

diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
index 711f82400086..6ff3921d7a62 100644
--- a/drivers/perf/Kconfig
+++ b/drivers/perf/Kconfig
@@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
 	  Enable perf support for Marvell DDR Performance monitoring
 	  event on CN10K platform.
 
+config DWC_PCIE_PMU
+	tristate "Enable Synopsys DesignWare PCIe PMU Support"
+	depends on (ARM64 && PCI)
+	help
+	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
+	  monitoring event on Yitian 710 platform.
+
 source "drivers/perf/arm_cspmu/Kconfig"
 
 source "drivers/perf/amlogic/Kconfig"
diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
index dabc859540ce..13a6d1b286da 100644
--- a/drivers/perf/Makefile
+++ b/drivers/perf/Makefile
@@ -22,5 +22,6 @@ obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
 obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
 obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
 obj-$(CONFIG_ALIBABA_UNCORE_DRW_PMU) += alibaba_uncore_drw_pmu.o
+obj-$(CONFIG_DWC_PCIE_PMU) += dwc_pcie_pmu.o
 obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
 obj-$(CONFIG_MESON_DDR_PMU) += amlogic/
diff --git a/drivers/perf/dwc_pcie_pmu.c b/drivers/perf/dwc_pcie_pmu.c
new file mode 100644
index 000000000000..8bfcf6e0662d
--- /dev/null
+++ b/drivers/perf/dwc_pcie_pmu.c
@@ -0,0 +1,706 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Synopsys DesignWare PCIe PMU driver
+ *
+ * Copyright (C) 2021-2023 Alibaba Inc.
+ */
+
+#include <linux/bitfield.h>
+#include <linux/bitops.h>
+#include <linux/cpuhotplug.h>
+#include <linux/cpumask.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/kernel.h>
+#include <linux/list.h>
+#include <linux/perf_event.h>
+#include <linux/pci.h>
+#include <linux/platform_device.h>
+#include <linux/smp.h>
+#include <linux/sysfs.h>
+#include <linux/types.h>
+
+#define DWC_PCIE_VSEC_RAS_DES_ID		0x02
+
+#define DWC_PCIE_EVENT_CNT_CTL			0x8
+
+/*
+ * Event Counter Data Select includes two parts:
+ * - 27-24: Group number(4-bit: 0..0x7)
+ * - 23-16: Event number(8-bit: 0..0x13) within the Group
+ *
+ * Put them togother as TRM used.
+ */
+#define DWC_PCIE_CNT_EVENT_SEL			GENMASK(27, 16)
+#define DWC_PCIE_CNT_LANE_SEL			GENMASK(11, 8)
+#define DWC_PCIE_CNT_STATUS			BIT(7)
+#define DWC_PCIE_CNT_ENABLE			GENMASK(4, 2)
+#define DWC_PCIE_PER_EVENT_OFF			0x1
+#define DWC_PCIE_PER_EVENT_ON			0x3
+#define DWC_PCIE_EVENT_CLEAR			GENMASK(1, 0)
+#define DWC_PCIE_EVENT_PER_CLEAR		0x1
+
+#define DWC_PCIE_EVENT_CNT_DATA			0xC
+
+#define DWC_PCIE_TIME_BASED_ANAL_CTL		0x10
+#define DWC_PCIE_TIME_BASED_REPORT_SEL		GENMASK(31, 24)
+#define DWC_PCIE_TIME_BASED_DURATION_SEL	GENMASK(15, 8)
+#define DWC_PCIE_DURATION_MANUAL_CTL		0x0
+#define DWC_PCIE_DURATION_1MS			0x1
+#define DWC_PCIE_DURATION_10MS			0x2
+#define DWC_PCIE_DURATION_100MS			0x3
+#define DWC_PCIE_DURATION_1S			0x4
+#define DWC_PCIE_DURATION_2S			0x5
+#define DWC_PCIE_DURATION_4S			0x6
+#define DWC_PCIE_DURATION_4US			0xFF
+#define DWC_PCIE_TIME_BASED_TIMER_START		BIT(0)
+#define DWC_PCIE_TIME_BASED_CNT_ENABLE		0x1
+
+#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW	0x14
+#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH	0x18
+
+/* Event attributes */
+#define DWC_PCIE_CONFIG_EVENTID			GENMASK(15, 0)
+#define DWC_PCIE_CONFIG_TYPE			GENMASK(19, 16)
+#define DWC_PCIE_CONFIG_LANE			GENMASK(27, 20)
+
+#define DWC_PCIE_EVENT_ID(event)	FIELD_GET(DWC_PCIE_CONFIG_EVENTID, (event)->attr.config)
+#define DWC_PCIE_EVENT_TYPE(event)	FIELD_GET(DWC_PCIE_CONFIG_TYPE, (event)->attr.config)
+#define DWC_PCIE_EVENT_LANE(event)	FIELD_GET(DWC_PCIE_CONFIG_LANE, (event)->attr.config)
+
+enum dwc_pcie_event_type {
+	DWC_PCIE_TYPE_INVALID,
+	DWC_PCIE_TIME_BASE_EVENT,
+	DWC_PCIE_LANE_EVENT,
+};
+
+#define DWC_PCIE_LANE_EVENT_MAX_PERIOD		GENMASK_ULL(31, 0)
+#define DWC_PCIE_TIME_BASED_EVENT_MAX_PERIOD	GENMASK_ULL(63, 0)
+
+
+struct dwc_pcie_pmu {
+	struct pci_dev		*pdev;		/* Root Port device */
+	u16			ras_des;	/* RAS DES capability offset */
+	u32			nr_lanes;
+
+	struct list_head	pmu_node;
+	struct hlist_node	cpuhp_node;
+	struct pmu		pmu;
+	struct perf_event	*event;
+	int			oncpu;
+};
+
+struct dwc_pcie_pmu_priv {
+	struct device *dev;
+	struct list_head pmu_nodes;
+};
+
+#define to_dwc_pcie_pmu(p) (container_of(p, struct dwc_pcie_pmu, pmu))
+
+static struct platform_device *dwc_pcie_pmu_dev;
+static int dwc_pcie_pmu_hp_state;
+
+static ssize_t cpumask_show(struct device *dev,
+					 struct device_attribute *attr,
+					 char *buf)
+{
+	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(dev_get_drvdata(dev));
+
+	return cpumap_print_to_pagebuf(true, buf, cpumask_of(pcie_pmu->oncpu));
+}
+static DEVICE_ATTR_RO(cpumask);
+
+static struct attribute *dwc_pcie_pmu_cpumask_attrs[] = {
+	&dev_attr_cpumask.attr,
+	NULL
+};
+
+static struct attribute_group dwc_pcie_cpumask_attr_group = {
+	.attrs = dwc_pcie_pmu_cpumask_attrs,
+};
+
+struct dwc_pcie_format_attr {
+	struct device_attribute attr;
+	u64 field;
+	int config;
+};
+
+static ssize_t dwc_pcie_pmu_format_show(struct device *dev,
+					struct device_attribute *attr,
+					char *buf)
+{
+	struct dwc_pcie_format_attr *fmt = container_of(attr, typeof(*fmt), attr);
+	int lo = __ffs(fmt->field), hi = __fls(fmt->field);
+
+	return sysfs_emit(buf, "config:%d-%d\n", lo, hi);
+}
+
+#define _dwc_pcie_format_attr(_name, _cfg, _fld)				\
+	(&((struct dwc_pcie_format_attr[]) {{					\
+		.attr = __ATTR(_name, 0444, dwc_pcie_pmu_format_show, NULL),	\
+		.config = _cfg,							\
+		.field = _fld,							\
+	}})[0].attr.attr)
+
+#define dwc_pcie_format_attr(_name, _fld)	_dwc_pcie_format_attr(_name, 0, _fld)
+
+static struct attribute *dwc_pcie_format_attrs[] = {
+	dwc_pcie_format_attr(type, DWC_PCIE_CONFIG_TYPE),
+	dwc_pcie_format_attr(eventid, DWC_PCIE_CONFIG_EVENTID),
+	dwc_pcie_format_attr(lane, DWC_PCIE_CONFIG_LANE),
+	NULL,
+};
+
+static struct attribute_group dwc_pcie_format_attrs_group = {
+	.name = "format",
+	.attrs = dwc_pcie_format_attrs,
+};
+
+struct dwc_pcie_event_attr {
+	struct device_attribute attr;
+	enum dwc_pcie_event_type type;
+	u16 eventid;
+	u8 lane;
+};
+
+static ssize_t dwc_pcie_event_show(struct device *dev,
+				struct device_attribute *attr, char *buf)
+{
+	struct dwc_pcie_event_attr *eattr;
+
+	eattr = container_of(attr, typeof(*eattr), attr);
+
+	if (eattr->type == DWC_PCIE_LANE_EVENT)
+		return sysfs_emit(buf, "eventid=0x%x,type=0x%x,lane=?\n",
+				  eattr->eventid, eattr->type);
+
+	return sysfs_emit(buf, "eventid=0x%x,type=0x%x\n", eattr->eventid,
+		       eattr->type);
+}
+
+#define DWC_PCIE_EVENT_ATTR(_name, _type, _eventid, _lane)		\
+	(&((struct dwc_pcie_event_attr[]) {{				\
+		.attr = __ATTR(_name, 0444, dwc_pcie_event_show, NULL),	\
+		.type = _type,						\
+		.eventid = _eventid,					\
+		.lane = _lane,						\
+	}})[0].attr.attr)
+
+#define DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(_name, _eventid)		\
+	DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_TIME_BASE_EVENT, _eventid, 0)
+#define DWC_PCIE_PMU_LANE_EVENT_ATTR(_name, _eventid)			\
+	DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_LANE_EVENT, _eventid, 0)
+
+static struct attribute *dwc_pcie_pmu_time_event_attrs[] = {
+	/* Group #0 */
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(one_cycle, 0x00),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_L0S, 0x01),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(RX_L0S, 0x02),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L0, 0x03),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1, 0x04),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_1, 0x05),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_2, 0x06),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(CFG_RCVRY, 0x07),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_RX_L0S, 0x08),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_AUX, 0x09),
+
+	/* Group #1 */
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_PCIe_TLP_Data_Payload, 0x20),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_PCIe_TLP_Data_Payload, 0x21),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_CCIX_TLP_Data_Payload, 0x22),
+	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_CCIX_TLP_Data_Payload, 0x23),
+
+	/*
+	 * Leave it to the user to specify the lane ID to avoid generating
+	 * a list of hundreds of events.
+	 */
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ack_dllp, 0x600),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_update_fc_dllp, 0x601),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ack_dllp, 0x602),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_update_fc_dllp, 0x603),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_nulified_tlp, 0x604),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_nulified_tlp, 0x605),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_duplicate_tl, 0x606),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_write, 0x700),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_read, 0x701),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_write, 0x702),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_read, 0x703),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_write, 0x704),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_read, 0x705),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_without_data, 0x706),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_with_data, 0x707),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_message_tlp, 0x708),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_atomic, 0x709),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_tlp_with_prefix, 0x70A),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_write, 0x70B),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_read, 0x70C),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_write, 0x70F),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_read, 0x710),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_without_data, 0x711),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_with_data, 0x712),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_message_tlp, 0x713),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_atomic, 0x714),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_tlp_with_prefix, 0x715),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ccix_tlp, 0x716),
+	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ccix_tlp, 0x717),
+
+	NULL
+};
+
+static const struct attribute_group dwc_pcie_event_attrs_group = {
+	.name = "events",
+	.attrs = dwc_pcie_pmu_time_event_attrs,
+};
+
+static const struct attribute_group *dwc_pcie_attr_groups[] = {
+	&dwc_pcie_event_attrs_group,
+	&dwc_pcie_format_attrs_group,
+	&dwc_pcie_cpumask_attr_group,
+	NULL
+};
+
+static void dwc_pcie_pmu_lane_event_enable(struct dwc_pcie_pmu *pcie_pmu,
+					   bool enable)
+{
+	struct pci_dev *pdev = pcie_pmu->pdev;
+	u16 ras_des = pcie_pmu->ras_des;
+	u32 val;
+
+	pci_read_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL, &val);
+
+	/* Clear DWC_PCIE_CNT_ENABLE field first */
+	val &= ~DWC_PCIE_CNT_ENABLE;
+	if (enable)
+		val |= FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_ON);
+	else
+		val |= FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF);
+
+	pci_write_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL, val);
+}
+
+static void dwc_pcie_pmu_time_based_event_enable(struct dwc_pcie_pmu *pcie_pmu,
+					  bool enable)
+{
+	struct pci_dev *pdev = pcie_pmu->pdev;
+	u16 ras_des = pcie_pmu->ras_des;
+	u32 val;
+
+	pci_read_config_dword(pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL,
+			      &val);
+
+	if (enable)
+		val |= DWC_PCIE_TIME_BASED_CNT_ENABLE;
+	else
+		val &= ~DWC_PCIE_TIME_BASED_CNT_ENABLE;
+
+	pci_write_config_dword(pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL,
+			       val);
+}
+
+static u64 dwc_pcie_pmu_read_lane_event_counter(struct dwc_pcie_pmu *pcie_pmu)
+{
+	struct pci_dev *pdev = pcie_pmu->pdev;
+	u16 ras_des = pcie_pmu->ras_des;
+	u32 val;
+
+	pci_read_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_DATA, &val);
+
+	return val;
+}
+
+static u64 dwc_pcie_pmu_read_time_based_counter(struct dwc_pcie_pmu *pcie_pmu)
+{
+	struct pci_dev *pdev = pcie_pmu->pdev;
+	u16 ras_des = pcie_pmu->ras_des;
+	u64 count;
+	u32 val;
+
+	pci_read_config_dword(
+		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH, &val);
+	count = val;
+	count <<= 32;
+
+	pci_read_config_dword(
+		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW, &val);
+
+	count += val;
+
+	return count;
+}
+
+static void dwc_pcie_pmu_event_update(struct perf_event *event)
+{
+	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+	struct hw_perf_event *hwc = &event->hw;
+	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+	u64 delta, prev, now;
+
+	do {
+		prev = local64_read(&hwc->prev_count);
+
+		if (type == DWC_PCIE_LANE_EVENT)
+			now = dwc_pcie_pmu_read_lane_event_counter(pcie_pmu);
+		else if (type == DWC_PCIE_TIME_BASE_EVENT)
+			now = dwc_pcie_pmu_read_time_based_counter(pcie_pmu);
+
+	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
+
+	if (type == DWC_PCIE_LANE_EVENT)
+		delta = (now - prev) & DWC_PCIE_LANE_EVENT_MAX_PERIOD;
+	else if (type == DWC_PCIE_TIME_BASE_EVENT)
+		delta = (now - prev) & DWC_PCIE_TIME_BASED_EVENT_MAX_PERIOD;
+
+	local64_add(delta, &event->count);
+}
+
+static int dwc_pcie_pmu_event_init(struct perf_event *event)
+{
+	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+	struct perf_event *sibling;
+	u32 lane;
+
+	if (event->attr.type != event->pmu->type)
+		return -ENOENT;
+
+	/* We don't support sampling */
+	if (is_sampling_event(event))
+		return -EINVAL;
+
+	/* We cannot support task bound events */
+	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK)
+		return -EINVAL;
+
+	if (event->group_leader != event &&
+	    !is_software_event(event->group_leader))
+		return -EINVAL;
+
+	for_each_sibling_event(sibling, event->group_leader) {
+		if (sibling->pmu != event->pmu && !is_software_event(sibling))
+			return -EINVAL;
+	}
+
+	if (type == DWC_PCIE_LANE_EVENT) {
+		lane = DWC_PCIE_EVENT_LANE(event);
+		if (lane < 0 || lane >= pcie_pmu->nr_lanes)
+			return -EINVAL;
+	}
+
+	event->cpu = pcie_pmu->oncpu;
+
+	return 0;
+}
+
+static void dwc_pcie_pmu_set_period(struct hw_perf_event *hwc)
+{
+	local64_set(&hwc->prev_count, 0);
+}
+
+static void dwc_pcie_pmu_event_start(struct perf_event *event, int flags)
+{
+	struct hw_perf_event *hwc = &event->hw;
+	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+
+	hwc->state = 0;
+	dwc_pcie_pmu_set_period(hwc);
+
+	if (type == DWC_PCIE_LANE_EVENT)
+		dwc_pcie_pmu_lane_event_enable(pcie_pmu, true);
+	else if (type == DWC_PCIE_TIME_BASE_EVENT)
+		dwc_pcie_pmu_time_based_event_enable(pcie_pmu, true);
+}
+
+static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags)
+{
+	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+	struct hw_perf_event *hwc = &event->hw;
+
+	if (event->hw.state & PERF_HES_STOPPED)
+		return;
+
+	if (type == DWC_PCIE_LANE_EVENT)
+		dwc_pcie_pmu_lane_event_enable(pcie_pmu, false);
+	else if (type == DWC_PCIE_TIME_BASE_EVENT)
+		dwc_pcie_pmu_time_based_event_enable(pcie_pmu, false);
+
+	dwc_pcie_pmu_event_update(event);
+	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
+}
+
+static int dwc_pcie_pmu_event_add(struct perf_event *event, int flags)
+{
+	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+	struct pci_dev *pdev = pcie_pmu->pdev;
+	struct hw_perf_event *hwc = &event->hw;
+	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
+	int event_id = DWC_PCIE_EVENT_ID(event);
+	int lane = DWC_PCIE_EVENT_LANE(event);
+	u16 ras_des = pcie_pmu->ras_des;
+	u32 ctrl;
+
+	/* Only one counter and it is in use */
+	if (pcie_pmu->event)
+		return -ENOSPC;
+
+	pcie_pmu->event = event;
+	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
+
+	if (type == DWC_PCIE_LANE_EVENT) {
+		/* EVENT_COUNTER_DATA_REG needs clear manually */
+		ctrl = FIELD_PREP(DWC_PCIE_CNT_EVENT_SEL, event_id) |
+			FIELD_PREP(DWC_PCIE_CNT_LANE_SEL, lane) |
+			FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF) |
+			FIELD_PREP(DWC_PCIE_EVENT_CLEAR, DWC_PCIE_EVENT_PER_CLEAR);
+		pci_write_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL,
+				       ctrl);
+	} else if (type == DWC_PCIE_TIME_BASE_EVENT) {
+		/*
+		 * TIME_BASED_ANAL_DATA_REG is a 64 bit register, we can safely
+		 * use it with any manually controlled duration. And it is
+		 * cleared when next measurement starts.
+		 */
+		ctrl = FIELD_PREP(DWC_PCIE_TIME_BASED_REPORT_SEL, event_id) |
+			FIELD_PREP(DWC_PCIE_TIME_BASED_DURATION_SEL,
+				   DWC_PCIE_DURATION_MANUAL_CTL) |
+			DWC_PCIE_TIME_BASED_CNT_ENABLE;
+		pci_write_config_dword(
+			pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL, ctrl);
+	}
+
+	if (flags & PERF_EF_START)
+		dwc_pcie_pmu_event_start(event, PERF_EF_RELOAD);
+
+	perf_event_update_userpage(event);
+
+	return 0;
+}
+
+static void dwc_pcie_pmu_event_del(struct perf_event *event, int flags)
+{
+	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
+
+	dwc_pcie_pmu_event_stop(event, flags | PERF_EF_UPDATE);
+	perf_event_update_userpage(event);
+	pcie_pmu->event = NULL;
+}
+
+static int __dwc_pcie_pmu_probe(struct dwc_pcie_pmu_priv *priv)
+{
+	struct pci_dev *pdev = NULL;
+	struct dwc_pcie_pmu *pcie_pmu;
+	char *name;
+	u32 bdf;
+	int ret;
+
+	INIT_LIST_HEAD(&priv->pmu_nodes);
+
+	/* Match the rootport with VSEC_RAS_DES_ID, and register a PMU for it */
+	for_each_pci_dev(pdev) {
+		u16 vsec;
+		u32 val;
+
+		if (!(pci_is_pcie(pdev) &&
+		      pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT))
+			continue;
+
+		vsec = pci_find_vsec_capability(pdev, PCI_VENDOR_ID_ALIBABA,
+						DWC_PCIE_VSEC_RAS_DES_ID);
+		if (!vsec)
+			continue;
+
+		pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val);
+		if (PCI_VNDR_HEADER_REV(val) != 0x04 ||
+		    PCI_VNDR_HEADER_LEN(val) != 0x100)
+			continue;
+		pci_dbg(pdev,
+			"Detected PCIe Vendor-Specific Extended Capability RAS DES\n");
+
+		bdf = PCI_DEVID(pdev->bus->number, pdev->devfn);
+		name = devm_kasprintf(priv->dev, GFP_KERNEL, "dwc_rootport_%x",
+				      bdf);
+		if (!name)
+			return -ENOMEM;
+
+		/* All checks passed, go go go */
+		pcie_pmu = devm_kzalloc(&pdev->dev, sizeof(*pcie_pmu), GFP_KERNEL);
+		if (!pcie_pmu) {
+			pci_dev_put(pdev);
+			return -ENOMEM;
+		}
+
+		pcie_pmu->pdev = pdev;
+		pcie_pmu->ras_des = vsec;
+		pcie_pmu->nr_lanes = pcie_get_width_cap(pdev);
+		pcie_pmu->pmu = (struct pmu){
+			.module		= THIS_MODULE,
+			.attr_groups	= dwc_pcie_attr_groups,
+			.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
+			.task_ctx_nr	= perf_invalid_context,
+			.event_init	= dwc_pcie_pmu_event_init,
+			.add		= dwc_pcie_pmu_event_add,
+			.del		= dwc_pcie_pmu_event_del,
+			.start		= dwc_pcie_pmu_event_start,
+			.stop		= dwc_pcie_pmu_event_stop,
+			.read		= dwc_pcie_pmu_event_update,
+		};
+
+		/* Add this instance to the list used by the offline callback */
+		ret = cpuhp_state_add_instance(dwc_pcie_pmu_hp_state,
+					       &pcie_pmu->cpuhp_node);
+		if (ret) {
+			pci_err(pcie_pmu->pdev,
+				"Error %d registering hotplug @%x\n", ret, bdf);
+			return ret;
+		}
+		ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
+		if (ret) {
+			pci_err(pcie_pmu->pdev,
+				"Error %d registering PMU @%x\n", ret, bdf);
+			cpuhp_state_remove_instance_nocalls(
+				dwc_pcie_pmu_hp_state, &pcie_pmu->cpuhp_node);
+			return ret;
+		}
+
+		/* Add registered PMUs and unregister them when this driver remove */
+		list_add(&pcie_pmu->pmu_node, &priv->pmu_nodes);
+	}
+
+	return 0;
+}
+
+static int dwc_pcie_pmu_remove(struct platform_device *pdev)
+{
+	struct dwc_pcie_pmu_priv *priv = platform_get_drvdata(pdev);
+	struct dwc_pcie_pmu *pcie_pmu;
+
+	list_for_each_entry(pcie_pmu, &priv->pmu_nodes, pmu_node) {
+		cpuhp_state_remove_instance(dwc_pcie_pmu_hp_state,
+					    &pcie_pmu->cpuhp_node);
+		perf_pmu_unregister(&pcie_pmu->pmu);
+	}
+
+	return 0;
+}
+
+static int dwc_pcie_pmu_probe(struct platform_device *pdev)
+{
+	struct dwc_pcie_pmu_priv *priv;
+	int ret;
+
+	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
+	if (!priv)
+		return -ENOMEM;
+
+	priv->dev = &pdev->dev;
+	platform_set_drvdata(pdev, priv);
+
+	/* If one PMU registration fails, remove all. */
+	ret = __dwc_pcie_pmu_probe(priv);
+	if (ret) {
+		dwc_pcie_pmu_remove(pdev);
+		return ret;
+	}
+
+	return 0;
+}
+
+static void dwc_pcie_pmu_migrate(struct dwc_pcie_pmu *pcie_pmu, unsigned int cpu)
+{
+	/* This PMU does NOT support interrupt, just migrate context. */
+	perf_pmu_migrate_context(&pcie_pmu->pmu, pcie_pmu->oncpu, cpu);
+	pcie_pmu->oncpu = cpu;
+}
+
+static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
+{
+	struct dwc_pcie_pmu *pcie_pmu;
+	struct pci_dev *pdev;
+	int node;
+
+	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
+	pdev = pcie_pmu->pdev;
+	node = dev_to_node(&pdev->dev);
+
+	if (node != NUMA_NO_NODE && cpu_to_node(pcie_pmu->oncpu) != node &&
+	    cpu_to_node(cpu) == node)
+		dwc_pcie_pmu_migrate(pcie_pmu, cpu);
+
+	return 0;
+}
+
+static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
+{
+	struct dwc_pcie_pmu *pcie_pmu;
+	struct pci_dev *pdev;
+	int node;
+	cpumask_t mask;
+	unsigned int target;
+
+	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
+	if (cpu != pcie_pmu->oncpu)
+		return 0;
+
+	pdev = pcie_pmu->pdev;
+	node = dev_to_node(&pdev->dev);
+	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
+	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
+		target = cpumask_any(&mask);
+	else
+		target = cpumask_any_but(cpu_online_mask, cpu);
+	if (target < nr_cpu_ids)
+		dwc_pcie_pmu_migrate(pcie_pmu, target);
+
+	return 0;
+}
+
+static struct platform_driver dwc_pcie_pmu_driver = {
+	.probe = dwc_pcie_pmu_probe,
+	.remove = dwc_pcie_pmu_remove,
+	.driver = {.name = "dwc_pcie_pmu",},
+};
+
+static int __init dwc_pcie_pmu_init(void)
+{
+	int ret;
+
+	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
+				      "perf/dwc_pcie_pmu:online",
+				      dwc_pcie_pmu_online_cpu,
+				      dwc_pcie_pmu_offline_cpu);
+	if (ret < 0)
+		return ret;
+
+	dwc_pcie_pmu_hp_state = ret;
+
+	ret = platform_driver_register(&dwc_pcie_pmu_driver);
+	if (ret) {
+		cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
+		return ret;
+	}
+
+	dwc_pcie_pmu_dev = platform_device_register_simple(
+				"dwc_pcie_pmu", PLATFORM_DEVID_NONE, NULL, 0);
+	if (IS_ERR(dwc_pcie_pmu_dev)) {
+		platform_driver_unregister(&dwc_pcie_pmu_driver);
+		return PTR_ERR(dwc_pcie_pmu_dev);
+	}
+
+	return 0;
+}
+
+static void __exit dwc_pcie_pmu_exit(void)
+{
+	platform_device_unregister(dwc_pcie_pmu_dev);
+	platform_driver_unregister(&dwc_pcie_pmu_driver);
+	cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
+}
+
+module_init(dwc_pcie_pmu_init);
+module_exit(dwc_pcie_pmu_exit);
+
+MODULE_DESCRIPTION("PMU driver for DesignWare Cores PCI Express Controller");
+MODULE_AUTHOR("Shuai xue <xueshuai@linux.alibaba.com>");
+MODULE_AUTHOR("Wen Cheng <yinxuan_cw@linux.alibaba.com>");
+MODULE_LICENSE("GPL v2");
-- 
2.20.1.12.g72788fdb


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* [PATCH v6 4/4] MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
  2023-06-06  7:49 [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
                   ` (2 preceding siblings ...)
  2023-06-06  7:49 ` [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver Shuai Xue
@ 2023-06-06  7:49 ` Shuai Xue
  2023-06-16  8:39 ` [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
  4 siblings, 0 replies; 31+ messages in thread
From: Shuai Xue @ 2023-06-06  7:49 UTC (permalink / raw)
  To: chengyou, kaishen, helgaas, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song, xueshuai

Add maintainers for Synopsys DesignWare PCIe PMU driver and driver
document.

Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
---
 MAINTAINERS | 6 ++++++
 1 file changed, 6 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 250518fc70ff..3f0aaf15469b 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -20483,6 +20483,12 @@ L:	linux-mmc@vger.kernel.org
 S:	Maintained
 F:	drivers/mmc/host/dw_mmc*
 
+SYNOPSYS DESIGNWARE PCIE PMU DRIVER
+M:	Shuai Xue <xueshuai@linux.alibaba.com>
+S:	Supported
+F:	Documentation/admin-guide/perf/dwc_pcie_pmu.rst
+F:	drivers/perf/dwc_pcie_pmu.c
+
 SYNOPSYS HSDK RESET CONTROLLER DRIVER
 M:	Eugeniy Paltsev <Eugeniy.Paltsev@synopsys.com>
 S:	Supported
-- 
2.20.1.12.g72788fdb


^ permalink raw reply related	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-06-06  7:49 ` [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver Shuai Xue
@ 2023-06-06 15:14   ` Yicong Yang
  2023-07-27  9:39     ` Jonathan Cameron
  2023-07-28  1:31     ` Shuai Xue
  0 siblings, 2 replies; 31+ messages in thread
From: Yicong Yang @ 2023-06-06 15:14 UTC (permalink / raw)
  To: Shuai Xue, chengyou, kaishen, helgaas, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: yangyicong, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song

On 2023/6/6 15:49, Shuai Xue wrote:
> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
> Core controller IP which provides statistics feature. The PMU is not a PCIe
> Root Complex integrated End Point(RCiEP) device but only register counters
> provided by each PCIe Root Port.
> 
> To facilitate collection of statistics the controller provides the
> following two features for each Root Port:
> 
> - Time Based Analysis (RX/TX data throughput and time spent in each
>   low-power LTSSM state)
> - Event counters (Error and Non-Error for lanes)
> 
> Note, only one counter for each type and does not overflow interrupt.
> 
> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
> named based the BDF of Root Port. For example,
> 
>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
> 
> the PMU device name for this Root Port is dwc_rootport_3018.
> 
> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
> 
>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
> 
> average RX bandwidth can be calculated like this:
> 
>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Reported-by: kernel test robot <lkp@intel.com>
> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> ---
>  drivers/perf/Kconfig        |   7 +
>  drivers/perf/Makefile       |   1 +
>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>  3 files changed, 714 insertions(+)
>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
> 
> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> index 711f82400086..6ff3921d7a62 100644
> --- a/drivers/perf/Kconfig
> +++ b/drivers/perf/Kconfig
> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>  	  Enable perf support for Marvell DDR Performance monitoring
>  	  event on CN10K platform.
>  
> +config DWC_PCIE_PMU
> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
> +	depends on (ARM64 && PCI)
> +	help
> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
> +	  monitoring event on Yitian 710 platform.
> +
>  source "drivers/perf/arm_cspmu/Kconfig"
>  
>  source "drivers/perf/amlogic/Kconfig"
> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
> index dabc859540ce..13a6d1b286da 100644
> --- a/drivers/perf/Makefile
> +++ b/drivers/perf/Makefile
> @@ -22,5 +22,6 @@ obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
>  obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
>  obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
>  obj-$(CONFIG_ALIBABA_UNCORE_DRW_PMU) += alibaba_uncore_drw_pmu.o
> +obj-$(CONFIG_DWC_PCIE_PMU) += dwc_pcie_pmu.o
>  obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
>  obj-$(CONFIG_MESON_DDR_PMU) += amlogic/
> diff --git a/drivers/perf/dwc_pcie_pmu.c b/drivers/perf/dwc_pcie_pmu.c
> new file mode 100644
> index 000000000000..8bfcf6e0662d
> --- /dev/null
> +++ b/drivers/perf/dwc_pcie_pmu.c
> @@ -0,0 +1,706 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Synopsys DesignWare PCIe PMU driver
> + *
> + * Copyright (C) 2021-2023 Alibaba Inc.
> + */
> +
> +#include <linux/bitfield.h>
> +#include <linux/bitops.h>
> +#include <linux/cpuhotplug.h>
> +#include <linux/cpumask.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/kernel.h>
> +#include <linux/list.h>
> +#include <linux/perf_event.h>
> +#include <linux/pci.h>
> +#include <linux/platform_device.h>
> +#include <linux/smp.h>
> +#include <linux/sysfs.h>
> +#include <linux/types.h>
> +
> +#define DWC_PCIE_VSEC_RAS_DES_ID		0x02
> +
> +#define DWC_PCIE_EVENT_CNT_CTL			0x8
> +
> +/*
> + * Event Counter Data Select includes two parts:
> + * - 27-24: Group number(4-bit: 0..0x7)
> + * - 23-16: Event number(8-bit: 0..0x13) within the Group
> + *
> + * Put them togother as TRM used.
> + */
> +#define DWC_PCIE_CNT_EVENT_SEL			GENMASK(27, 16)
> +#define DWC_PCIE_CNT_LANE_SEL			GENMASK(11, 8)
> +#define DWC_PCIE_CNT_STATUS			BIT(7)
> +#define DWC_PCIE_CNT_ENABLE			GENMASK(4, 2)
> +#define DWC_PCIE_PER_EVENT_OFF			0x1
> +#define DWC_PCIE_PER_EVENT_ON			0x3
> +#define DWC_PCIE_EVENT_CLEAR			GENMASK(1, 0)
> +#define DWC_PCIE_EVENT_PER_CLEAR		0x1
> +
> +#define DWC_PCIE_EVENT_CNT_DATA			0xC
> +
> +#define DWC_PCIE_TIME_BASED_ANAL_CTL		0x10
> +#define DWC_PCIE_TIME_BASED_REPORT_SEL		GENMASK(31, 24)
> +#define DWC_PCIE_TIME_BASED_DURATION_SEL	GENMASK(15, 8)
> +#define DWC_PCIE_DURATION_MANUAL_CTL		0x0
> +#define DWC_PCIE_DURATION_1MS			0x1
> +#define DWC_PCIE_DURATION_10MS			0x2
> +#define DWC_PCIE_DURATION_100MS			0x3
> +#define DWC_PCIE_DURATION_1S			0x4
> +#define DWC_PCIE_DURATION_2S			0x5
> +#define DWC_PCIE_DURATION_4S			0x6
> +#define DWC_PCIE_DURATION_4US			0xFF
> +#define DWC_PCIE_TIME_BASED_TIMER_START		BIT(0)
> +#define DWC_PCIE_TIME_BASED_CNT_ENABLE		0x1
> +
> +#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW	0x14
> +#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH	0x18
> +
> +/* Event attributes */
> +#define DWC_PCIE_CONFIG_EVENTID			GENMASK(15, 0)
> +#define DWC_PCIE_CONFIG_TYPE			GENMASK(19, 16)
> +#define DWC_PCIE_CONFIG_LANE			GENMASK(27, 20)
> +
> +#define DWC_PCIE_EVENT_ID(event)	FIELD_GET(DWC_PCIE_CONFIG_EVENTID, (event)->attr.config)
> +#define DWC_PCIE_EVENT_TYPE(event)	FIELD_GET(DWC_PCIE_CONFIG_TYPE, (event)->attr.config)
> +#define DWC_PCIE_EVENT_LANE(event)	FIELD_GET(DWC_PCIE_CONFIG_LANE, (event)->attr.config)
> +
> +enum dwc_pcie_event_type {
> +	DWC_PCIE_TYPE_INVALID,
> +	DWC_PCIE_TIME_BASE_EVENT,
> +	DWC_PCIE_LANE_EVENT,
> +};
> +
> +#define DWC_PCIE_LANE_EVENT_MAX_PERIOD		GENMASK_ULL(31, 0)
> +#define DWC_PCIE_TIME_BASED_EVENT_MAX_PERIOD	GENMASK_ULL(63, 0)
> +
> +
> +struct dwc_pcie_pmu {
> +	struct pci_dev		*pdev;		/* Root Port device */

If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
pointer. I didn't see you hold the root port to avoid the removal.

> +	u16			ras_des;	/* RAS DES capability offset */
> +	u32			nr_lanes;
> +
> +	struct list_head	pmu_node;
> +	struct hlist_node	cpuhp_node;
> +	struct pmu		pmu;
> +	struct perf_event	*event;
> +	int			oncpu;
> +};
> +
> +struct dwc_pcie_pmu_priv {
> +	struct device *dev;
> +	struct list_head pmu_nodes;
> +};
> +
> +#define to_dwc_pcie_pmu(p) (container_of(p, struct dwc_pcie_pmu, pmu))
> +

somebody told me to put @pmu as the first member then this macro will have no calculation. :)

> +static struct platform_device *dwc_pcie_pmu_dev;
> +static int dwc_pcie_pmu_hp_state;
> +
> +static ssize_t cpumask_show(struct device *dev,
> +					 struct device_attribute *attr,
> +					 char *buf)
> +{
> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(dev_get_drvdata(dev));
> +
> +	return cpumap_print_to_pagebuf(true, buf, cpumask_of(pcie_pmu->oncpu));
> +}
> +static DEVICE_ATTR_RO(cpumask);
> +
> +static struct attribute *dwc_pcie_pmu_cpumask_attrs[] = {
> +	&dev_attr_cpumask.attr,
> +	NULL
> +};
> +
> +static struct attribute_group dwc_pcie_cpumask_attr_group = {
> +	.attrs = dwc_pcie_pmu_cpumask_attrs,
> +};
> +
> +struct dwc_pcie_format_attr {
> +	struct device_attribute attr;
> +	u64 field;
> +	int config;
> +};
> +
> +static ssize_t dwc_pcie_pmu_format_show(struct device *dev,
> +					struct device_attribute *attr,
> +					char *buf)
> +{
> +	struct dwc_pcie_format_attr *fmt = container_of(attr, typeof(*fmt), attr);
> +	int lo = __ffs(fmt->field), hi = __fls(fmt->field);
> +
> +	return sysfs_emit(buf, "config:%d-%d\n", lo, hi);
> +}
> +
> +#define _dwc_pcie_format_attr(_name, _cfg, _fld)				\
> +	(&((struct dwc_pcie_format_attr[]) {{					\
> +		.attr = __ATTR(_name, 0444, dwc_pcie_pmu_format_show, NULL),	\
> +		.config = _cfg,							\
> +		.field = _fld,							\
> +	}})[0].attr.attr)
> +
> +#define dwc_pcie_format_attr(_name, _fld)	_dwc_pcie_format_attr(_name, 0, _fld)
> +
> +static struct attribute *dwc_pcie_format_attrs[] = {
> +	dwc_pcie_format_attr(type, DWC_PCIE_CONFIG_TYPE),
> +	dwc_pcie_format_attr(eventid, DWC_PCIE_CONFIG_EVENTID),
> +	dwc_pcie_format_attr(lane, DWC_PCIE_CONFIG_LANE),
> +	NULL,
> +};
> +
> +static struct attribute_group dwc_pcie_format_attrs_group = {
> +	.name = "format",
> +	.attrs = dwc_pcie_format_attrs,
> +};
> +
> +struct dwc_pcie_event_attr {
> +	struct device_attribute attr;
> +	enum dwc_pcie_event_type type;
> +	u16 eventid;
> +	u8 lane;
> +};
> +
> +static ssize_t dwc_pcie_event_show(struct device *dev,
> +				struct device_attribute *attr, char *buf)
> +{
> +	struct dwc_pcie_event_attr *eattr;
> +
> +	eattr = container_of(attr, typeof(*eattr), attr);
> +
> +	if (eattr->type == DWC_PCIE_LANE_EVENT)
> +		return sysfs_emit(buf, "eventid=0x%x,type=0x%x,lane=?\n",
> +				  eattr->eventid, eattr->type);
> +
> +	return sysfs_emit(buf, "eventid=0x%x,type=0x%x\n", eattr->eventid,
> +		       eattr->type);
> +}
> +
> +#define DWC_PCIE_EVENT_ATTR(_name, _type, _eventid, _lane)		\
> +	(&((struct dwc_pcie_event_attr[]) {{				\
> +		.attr = __ATTR(_name, 0444, dwc_pcie_event_show, NULL),	\
> +		.type = _type,						\
> +		.eventid = _eventid,					\
> +		.lane = _lane,						\
> +	}})[0].attr.attr)
> +
> +#define DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(_name, _eventid)		\
> +	DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_TIME_BASE_EVENT, _eventid, 0)
> +#define DWC_PCIE_PMU_LANE_EVENT_ATTR(_name, _eventid)			\
> +	DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_LANE_EVENT, _eventid, 0)
> +
> +static struct attribute *dwc_pcie_pmu_time_event_attrs[] = {
> +	/* Group #0 */
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(one_cycle, 0x00),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_L0S, 0x01),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(RX_L0S, 0x02),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L0, 0x03),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1, 0x04),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_1, 0x05),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_2, 0x06),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(CFG_RCVRY, 0x07),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_RX_L0S, 0x08),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_AUX, 0x09),
> +
> +	/* Group #1 */
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_PCIe_TLP_Data_Payload, 0x20),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_PCIe_TLP_Data_Payload, 0x21),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_CCIX_TLP_Data_Payload, 0x22),
> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_CCIX_TLP_Data_Payload, 0x23),
> +
> +	/*
> +	 * Leave it to the user to specify the lane ID to avoid generating
> +	 * a list of hundreds of events.
> +	 */
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ack_dllp, 0x600),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_update_fc_dllp, 0x601),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ack_dllp, 0x602),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_update_fc_dllp, 0x603),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_nulified_tlp, 0x604),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_nulified_tlp, 0x605),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_duplicate_tl, 0x606),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_write, 0x700),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_read, 0x701),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_write, 0x702),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_read, 0x703),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_write, 0x704),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_read, 0x705),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_without_data, 0x706),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_with_data, 0x707),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_message_tlp, 0x708),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_atomic, 0x709),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_tlp_with_prefix, 0x70A),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_write, 0x70B),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_read, 0x70C),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_write, 0x70F),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_read, 0x710),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_without_data, 0x711),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_with_data, 0x712),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_message_tlp, 0x713),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_atomic, 0x714),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_tlp_with_prefix, 0x715),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ccix_tlp, 0x716),
> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ccix_tlp, 0x717),
> +

Intended blank line?

> +	NULL
> +};
> +
> +static const struct attribute_group dwc_pcie_event_attrs_group = {
> +	.name = "events",
> +	.attrs = dwc_pcie_pmu_time_event_attrs,
> +};
> +
> +static const struct attribute_group *dwc_pcie_attr_groups[] = {
> +	&dwc_pcie_event_attrs_group,
> +	&dwc_pcie_format_attrs_group,
> +	&dwc_pcie_cpumask_attr_group,
> +	NULL
> +};
> +
> +static void dwc_pcie_pmu_lane_event_enable(struct dwc_pcie_pmu *pcie_pmu,
> +					   bool enable)
> +{
> +	struct pci_dev *pdev = pcie_pmu->pdev;
> +	u16 ras_des = pcie_pmu->ras_des;
> +	u32 val;
> +
> +	pci_read_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL, &val);
> +
> +	/* Clear DWC_PCIE_CNT_ENABLE field first */
> +	val &= ~DWC_PCIE_CNT_ENABLE;
> +	if (enable)
> +		val |= FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_ON);
> +	else
> +		val |= FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF);
> +
> +	pci_write_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL, val);
> +}
> +
> +static void dwc_pcie_pmu_time_based_event_enable(struct dwc_pcie_pmu *pcie_pmu,
> +					  bool enable)
> +{
> +	struct pci_dev *pdev = pcie_pmu->pdev;
> +	u16 ras_des = pcie_pmu->ras_des;
> +	u32 val;
> +
> +	pci_read_config_dword(pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL,
> +			      &val);
> +
> +	if (enable)
> +		val |= DWC_PCIE_TIME_BASED_CNT_ENABLE;
> +	else
> +		val &= ~DWC_PCIE_TIME_BASED_CNT_ENABLE;
> +
> +	pci_write_config_dword(pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL,
> +			       val);
> +}
> +
> +static u64 dwc_pcie_pmu_read_lane_event_counter(struct dwc_pcie_pmu *pcie_pmu)
> +{
> +	struct pci_dev *pdev = pcie_pmu->pdev;
> +	u16 ras_des = pcie_pmu->ras_des;
> +	u32 val;
> +
> +	pci_read_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_DATA, &val);
> +
> +	return val;
> +}
> +
> +static u64 dwc_pcie_pmu_read_time_based_counter(struct dwc_pcie_pmu *pcie_pmu)
> +{
> +	struct pci_dev *pdev = pcie_pmu->pdev;
> +	u16 ras_des = pcie_pmu->ras_des;
> +	u64 count;
> +	u32 val;
> +
> +	pci_read_config_dword(
> +		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH, &val);
> +	count = val;
> +	count <<= 32;
> +
> +	pci_read_config_dword(
> +		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW, &val);
> +
> +	count += val;
> +
> +	return count;
> +}
> +
> +static void dwc_pcie_pmu_event_update(struct perf_event *event)
> +{
> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
> +	struct hw_perf_event *hwc = &event->hw;
> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
> +	u64 delta, prev, now;
> +
> +	do {
> +		prev = local64_read(&hwc->prev_count);
> +
> +		if (type == DWC_PCIE_LANE_EVENT)
> +			now = dwc_pcie_pmu_read_lane_event_counter(pcie_pmu);
> +		else if (type == DWC_PCIE_TIME_BASE_EVENT)
> +			now = dwc_pcie_pmu_read_time_based_counter(pcie_pmu);
> +
> +	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
> +
> +	if (type == DWC_PCIE_LANE_EVENT)
> +		delta = (now - prev) & DWC_PCIE_LANE_EVENT_MAX_PERIOD;
> +	else if (type == DWC_PCIE_TIME_BASE_EVENT)
> +		delta = (now - prev) & DWC_PCIE_TIME_BASED_EVENT_MAX_PERIOD;
> +
> +	local64_add(delta, &event->count);
> +}
> +
> +static int dwc_pcie_pmu_event_init(struct perf_event *event)
> +{
> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
> +	struct perf_event *sibling;
> +	u32 lane;
> +
> +	if (event->attr.type != event->pmu->type)
> +		return -ENOENT;
> +
> +	/* We don't support sampling */
> +	if (is_sampling_event(event))
> +		return -EINVAL;
> +
> +	/* We cannot support task bound events */
> +	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK)
> +		return -EINVAL;
> +
> +	if (event->group_leader != event &&
> +	    !is_software_event(event->group_leader))
> +		return -EINVAL;
> +
> +	for_each_sibling_event(sibling, event->group_leader) {
> +		if (sibling->pmu != event->pmu && !is_software_event(sibling))
> +			return -EINVAL;
> +	}
> +
> +	if (type == DWC_PCIE_LANE_EVENT) {
> +		lane = DWC_PCIE_EVENT_LANE(event);
> +		if (lane < 0 || lane >= pcie_pmu->nr_lanes)
> +			return -EINVAL;
> +	}
> +
> +	event->cpu = pcie_pmu->oncpu;
> +
> +	return 0;
> +}
> +
> +static void dwc_pcie_pmu_set_period(struct hw_perf_event *hwc)
> +{
> +	local64_set(&hwc->prev_count, 0);
> +}
> +
> +static void dwc_pcie_pmu_event_start(struct perf_event *event, int flags)
> +{
> +	struct hw_perf_event *hwc = &event->hw;
> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
> +
> +	hwc->state = 0;
> +	dwc_pcie_pmu_set_period(hwc);
> +
> +	if (type == DWC_PCIE_LANE_EVENT)
> +		dwc_pcie_pmu_lane_event_enable(pcie_pmu, true);
> +	else if (type == DWC_PCIE_TIME_BASE_EVENT)
> +		dwc_pcie_pmu_time_based_event_enable(pcie_pmu, true);
> +}
> +
> +static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags)
> +{
> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
> +	struct hw_perf_event *hwc = &event->hw;
> +
> +	if (event->hw.state & PERF_HES_STOPPED)
> +		return;
> +
> +	if (type == DWC_PCIE_LANE_EVENT)
> +		dwc_pcie_pmu_lane_event_enable(pcie_pmu, false);
> +	else if (type == DWC_PCIE_TIME_BASE_EVENT)
> +		dwc_pcie_pmu_time_based_event_enable(pcie_pmu, false);
> +
> +	dwc_pcie_pmu_event_update(event);
> +	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
> +}
> +
> +static int dwc_pcie_pmu_event_add(struct perf_event *event, int flags)
> +{
> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
> +	struct pci_dev *pdev = pcie_pmu->pdev;
> +	struct hw_perf_event *hwc = &event->hw;
> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
> +	int event_id = DWC_PCIE_EVENT_ID(event);
> +	int lane = DWC_PCIE_EVENT_LANE(event);
> +	u16 ras_des = pcie_pmu->ras_des;
> +	u32 ctrl;
> +
> +	/* Only one counter and it is in use */
> +	if (pcie_pmu->event)
> +		return -ENOSPC;
> +
> +	pcie_pmu->event = event;
> +	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> +
> +	if (type == DWC_PCIE_LANE_EVENT) {
> +		/* EVENT_COUNTER_DATA_REG needs clear manually */
> +		ctrl = FIELD_PREP(DWC_PCIE_CNT_EVENT_SEL, event_id) |
> +			FIELD_PREP(DWC_PCIE_CNT_LANE_SEL, lane) |
> +			FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF) |
> +			FIELD_PREP(DWC_PCIE_EVENT_CLEAR, DWC_PCIE_EVENT_PER_CLEAR);
> +		pci_write_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL,
> +				       ctrl);
> +	} else if (type == DWC_PCIE_TIME_BASE_EVENT) {
> +		/*
> +		 * TIME_BASED_ANAL_DATA_REG is a 64 bit register, we can safely
> +		 * use it with any manually controlled duration. And it is
> +		 * cleared when next measurement starts.
> +		 */
> +		ctrl = FIELD_PREP(DWC_PCIE_TIME_BASED_REPORT_SEL, event_id) |
> +			FIELD_PREP(DWC_PCIE_TIME_BASED_DURATION_SEL,
> +				   DWC_PCIE_DURATION_MANUAL_CTL) |
> +			DWC_PCIE_TIME_BASED_CNT_ENABLE;
> +		pci_write_config_dword(
> +			pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL, ctrl);
> +	}
> +
> +	if (flags & PERF_EF_START)
> +		dwc_pcie_pmu_event_start(event, PERF_EF_RELOAD);
> +
> +	perf_event_update_userpage(event);
> +
> +	return 0;
> +}
> +
> +static void dwc_pcie_pmu_event_del(struct perf_event *event, int flags)
> +{
> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
> +
> +	dwc_pcie_pmu_event_stop(event, flags | PERF_EF_UPDATE);
> +	perf_event_update_userpage(event);
> +	pcie_pmu->event = NULL;
> +}
> +
> +static int __dwc_pcie_pmu_probe(struct dwc_pcie_pmu_priv *priv)
> +{
> +	struct pci_dev *pdev = NULL;
> +	struct dwc_pcie_pmu *pcie_pmu;
> +	char *name;
> +	u32 bdf;
> +	int ret;
> +
> +	INIT_LIST_HEAD(&priv->pmu_nodes);
> +
> +	/* Match the rootport with VSEC_RAS_DES_ID, and register a PMU for it */
> +	for_each_pci_dev(pdev) {
> +		u16 vsec;
> +		u32 val;
> +
> +		if (!(pci_is_pcie(pdev) &&
> +		      pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT))
> +			continue;
> +
> +		vsec = pci_find_vsec_capability(pdev, PCI_VENDOR_ID_ALIBABA,
> +						DWC_PCIE_VSEC_RAS_DES_ID);
> +		if (!vsec)
> +			continue;
> +
> +		pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val);
> +		if (PCI_VNDR_HEADER_REV(val) != 0x04 ||
> +		    PCI_VNDR_HEADER_LEN(val) != 0x100)
> +			continue;
> +		pci_dbg(pdev,
> +			"Detected PCIe Vendor-Specific Extended Capability RAS DES\n");
> +
> +		bdf = PCI_DEVID(pdev->bus->number, pdev->devfn);
> +		name = devm_kasprintf(priv->dev, GFP_KERNEL, "dwc_rootport_%x",
> +				      bdf);
> +		if (!name)
> +			return -ENOMEM;
> +
> +		/* All checks passed, go go go */
> +		pcie_pmu = devm_kzalloc(&pdev->dev, sizeof(*pcie_pmu), GFP_KERNEL);
> +		if (!pcie_pmu) {
> +			pci_dev_put(pdev);

we need to call pci_dev_put on all the return branch below and above and after the for_each_pci_dev()
loop to keep the refcnt balance.

> +			return -ENOMEM;
> +		}
> +
> +		pcie_pmu->pdev = pdev;
> +		pcie_pmu->ras_des = vsec;
> +		pcie_pmu->nr_lanes = pcie_get_width_cap(pdev);
> +		pcie_pmu->pmu = (struct pmu){
> +			.module		= THIS_MODULE,
> +			.attr_groups	= dwc_pcie_attr_groups,
> +			.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
> +			.task_ctx_nr	= perf_invalid_context,
> +			.event_init	= dwc_pcie_pmu_event_init,
> +			.add		= dwc_pcie_pmu_event_add,
> +			.del		= dwc_pcie_pmu_event_del,
> +			.start		= dwc_pcie_pmu_event_start,
> +			.stop		= dwc_pcie_pmu_event_stop,
> +			.read		= dwc_pcie_pmu_event_update,
> +		};
> +
> +		/* Add this instance to the list used by the offline callback */
> +		ret = cpuhp_state_add_instance(dwc_pcie_pmu_hp_state,
> +					       &pcie_pmu->cpuhp_node);
> +		if (ret) {
> +			pci_err(pcie_pmu->pdev,
> +				"Error %d registering hotplug @%x\n", ret, bdf);
> +			return ret;
> +		}
> +		ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
> +		if (ret) {
> +			pci_err(pcie_pmu->pdev,
> +				"Error %d registering PMU @%x\n", ret, bdf);
> +			cpuhp_state_remove_instance_nocalls(
> +				dwc_pcie_pmu_hp_state, &pcie_pmu->cpuhp_node);
> +			return ret;
> +		}
> +
> +		/* Add registered PMUs and unregister them when this driver remove */
> +		list_add(&pcie_pmu->pmu_node, &priv->pmu_nodes);
> +	}
> +
> +	return 0;
> +}
> +
> +static int dwc_pcie_pmu_remove(struct platform_device *pdev)
> +{
> +	struct dwc_pcie_pmu_priv *priv = platform_get_drvdata(pdev);
> +	struct dwc_pcie_pmu *pcie_pmu;
> +
> +	list_for_each_entry(pcie_pmu, &priv->pmu_nodes, pmu_node) {
> +		cpuhp_state_remove_instance(dwc_pcie_pmu_hp_state,
> +					    &pcie_pmu->cpuhp_node);
> +		perf_pmu_unregister(&pcie_pmu->pmu);

should unregister the PMU first, keep the order reverse to __dwc_pcie_pmu_probe().

> +	}
> +
> +	return 0;
> +}
> +
> +static int dwc_pcie_pmu_probe(struct platform_device *pdev)
> +{
> +	struct dwc_pcie_pmu_priv *priv;
> +	int ret;
> +
> +	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> +	if (!priv)
> +		return -ENOMEM;
> +
> +	priv->dev = &pdev->dev;
> +	platform_set_drvdata(pdev, priv);
> +
> +	/* If one PMU registration fails, remove all. */
> +	ret = __dwc_pcie_pmu_probe(priv);
> +	if (ret) {
> +		dwc_pcie_pmu_remove(pdev);
> +		return ret;
> +	}
> +
> +	return 0;
> +}
> +
> +static void dwc_pcie_pmu_migrate(struct dwc_pcie_pmu *pcie_pmu, unsigned int cpu)
> +{
> +	/* This PMU does NOT support interrupt, just migrate context. */
> +	perf_pmu_migrate_context(&pcie_pmu->pmu, pcie_pmu->oncpu, cpu);
> +	pcie_pmu->oncpu = cpu;
> +}
> +
> +static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
> +{
> +	struct dwc_pcie_pmu *pcie_pmu;
> +	struct pci_dev *pdev;
> +	int node;
> +
> +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
> +	pdev = pcie_pmu->pdev;
> +	node = dev_to_node(&pdev->dev);
> +
> +	if (node != NUMA_NO_NODE && cpu_to_node(pcie_pmu->oncpu) != node &&
> +	    cpu_to_node(cpu) == node)
> +		dwc_pcie_pmu_migrate(pcie_pmu, cpu);
> +
> +	return 0;
> +}
> +
> +static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
> +{
> +	struct dwc_pcie_pmu *pcie_pmu;
> +	struct pci_dev *pdev;
> +	int node;
> +	cpumask_t mask;
> +	unsigned int target;
> +
> +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
> +	if (cpu != pcie_pmu->oncpu)
> +		return 0;
> +
> +	pdev = pcie_pmu->pdev;
> +	node = dev_to_node(&pdev->dev);
> +	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
> +	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
> +		target = cpumask_any(&mask);

The cpumask_of_node() only contains the online CPUs so this branch is redundant. For arm64
using arch_numa.c the node cpumask is updated in numa_{add, remove}_cpu() and for other
arthitecture the behaviour should keep consistenct. Please correct my if I'm wrong.

> +	else
> +		target = cpumask_any_but(cpu_online_mask, cpu);
> +	if (target < nr_cpu_ids)
> +		dwc_pcie_pmu_migrate(pcie_pmu, target);
> +
> +	return 0;
> +}
> +
> +static struct platform_driver dwc_pcie_pmu_driver = {
> +	.probe = dwc_pcie_pmu_probe,
> +	.remove = dwc_pcie_pmu_remove,
> +	.driver = {.name = "dwc_pcie_pmu",},
> +};
> +
> +static int __init dwc_pcie_pmu_init(void)
> +{
> +	int ret;
> +
> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> +				      "perf/dwc_pcie_pmu:online",
> +				      dwc_pcie_pmu_online_cpu,
> +				      dwc_pcie_pmu_offline_cpu);
> +	if (ret < 0)
> +		return ret;
> +
> +	dwc_pcie_pmu_hp_state = ret;
> +
> +	ret = platform_driver_register(&dwc_pcie_pmu_driver);
> +	if (ret) {
> +		cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
> +		return ret;
> +	}
> +
> +	dwc_pcie_pmu_dev = platform_device_register_simple(
> +				"dwc_pcie_pmu", PLATFORM_DEVID_NONE, NULL, 0);
> +	if (IS_ERR(dwc_pcie_pmu_dev)) {
> +		platform_driver_unregister(&dwc_pcie_pmu_driver);

On failure we also need to remove cpuhp state as well.

Thanks,
Yicong

> +		return PTR_ERR(dwc_pcie_pmu_dev);
> +	}
> +
> +	return 0;
> +}
> +
> +static void __exit dwc_pcie_pmu_exit(void)
> +{
> +	platform_device_unregister(dwc_pcie_pmu_dev);
> +	platform_driver_unregister(&dwc_pcie_pmu_driver);
> +	cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
> +}
> +
> +module_init(dwc_pcie_pmu_init);
> +module_exit(dwc_pcie_pmu_exit);
> +
> +MODULE_DESCRIPTION("PMU driver for DesignWare Cores PCI Express Controller");
> +MODULE_AUTHOR("Shuai xue <xueshuai@linux.alibaba.com>");
> +MODULE_AUTHOR("Wen Cheng <yinxuan_cw@linux.alibaba.com>");
> +MODULE_LICENSE("GPL v2");
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/4] PCI: Add Alibaba Vendor ID to linux/pci_ids.h
  2023-06-06  7:49 ` [PATCH v6 2/4] PCI: Add Alibaba Vendor ID to linux/pci_ids.h Shuai Xue
@ 2023-06-06 15:31   ` Bjorn Helgaas
  2023-06-07  0:42     ` Shuai Xue
  0 siblings, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2023-06-06 15:31 UTC (permalink / raw)
  To: Shuai Xue
  Cc: chengyou, kaishen, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy, linux-kernel, linux-arm-kernel,
	linux-pci, rdunlap, mark.rutland, zhuo.song

On Tue, Jun 06, 2023 at 03:49:36PM +0800, Shuai Xue wrote:
> The Alibaba Vendor ID (0x1ded) is now used by Alibaba elasticRDMA ("erdma")
> and will be shared with the upcoming PCIe PMU ("dwc_pcie_pmu"). Move the
> Vendor ID to linux/pci_ids.h so that it can shared by several drivers
> later.
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>

Acked-by: Bjorn Helgaas <bhelgaas@google.com>

> ---
>  drivers/infiniband/hw/erdma/erdma_hw.h | 2 --
>  include/linux/pci_ids.h                | 2 ++
>  2 files changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/infiniband/hw/erdma/erdma_hw.h b/drivers/infiniband/hw/erdma/erdma_hw.h
> index 76ce2856be28..ee35ebef9ee7 100644
> --- a/drivers/infiniband/hw/erdma/erdma_hw.h
> +++ b/drivers/infiniband/hw/erdma/erdma_hw.h
> @@ -11,8 +11,6 @@
>  #include <linux/types.h>
>  
>  /* PCIe device related definition. */
> -#define PCI_VENDOR_ID_ALIBABA 0x1ded
> -
>  #define ERDMA_PCI_WIDTH 64
>  #define ERDMA_FUNC_BAR 0
>  #define ERDMA_MISX_BAR 2
> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
> index 95f33dadb2be..9e8aec472f06 100644
> --- a/include/linux/pci_ids.h
> +++ b/include/linux/pci_ids.h
> @@ -2586,6 +2586,8 @@
>  #define PCI_VENDOR_ID_TEKRAM		0x1de1
>  #define PCI_DEVICE_ID_TEKRAM_DC290	0xdc29
>  
> +#define PCI_VENDOR_ID_ALIBABA		0x1ded
> +
>  #define PCI_VENDOR_ID_TEHUTI		0x1fc9
>  #define PCI_DEVICE_ID_TEHUTI_3009	0x3009
>  #define PCI_DEVICE_ID_TEHUTI_3010	0x3010
> -- 
> 2.20.1.12.g72788fdb
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 2/4] PCI: Add Alibaba Vendor ID to linux/pci_ids.h
  2023-06-06 15:31   ` Bjorn Helgaas
@ 2023-06-07  0:42     ` Shuai Xue
  0 siblings, 0 replies; 31+ messages in thread
From: Shuai Xue @ 2023-06-07  0:42 UTC (permalink / raw)
  To: Bjorn Helgaas
  Cc: chengyou, kaishen, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy, linux-kernel, linux-arm-kernel,
	linux-pci, rdunlap, mark.rutland, zhuo.song



On 2023/6/6 23:31, Bjorn Helgaas wrote:
> On Tue, Jun 06, 2023 at 03:49:36PM +0800, Shuai Xue wrote:
>> The Alibaba Vendor ID (0x1ded) is now used by Alibaba elasticRDMA ("erdma")
>> and will be shared with the upcoming PCIe PMU ("dwc_pcie_pmu"). Move the
>> Vendor ID to linux/pci_ids.h so that it can shared by several drivers
>> later.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> 
> Acked-by: Bjorn Helgaas <bhelgaas@google.com>
> 

Thank you :)

Cheers,
Shuai

>> ---
>>  drivers/infiniband/hw/erdma/erdma_hw.h | 2 --
>>  include/linux/pci_ids.h                | 2 ++
>>  2 files changed, 2 insertions(+), 2 deletions(-)
>>
>> diff --git a/drivers/infiniband/hw/erdma/erdma_hw.h b/drivers/infiniband/hw/erdma/erdma_hw.h
>> index 76ce2856be28..ee35ebef9ee7 100644
>> --- a/drivers/infiniband/hw/erdma/erdma_hw.h
>> +++ b/drivers/infiniband/hw/erdma/erdma_hw.h
>> @@ -11,8 +11,6 @@
>>  #include <linux/types.h>
>>  
>>  /* PCIe device related definition. */
>> -#define PCI_VENDOR_ID_ALIBABA 0x1ded
>> -
>>  #define ERDMA_PCI_WIDTH 64
>>  #define ERDMA_FUNC_BAR 0
>>  #define ERDMA_MISX_BAR 2
>> diff --git a/include/linux/pci_ids.h b/include/linux/pci_ids.h
>> index 95f33dadb2be..9e8aec472f06 100644
>> --- a/include/linux/pci_ids.h
>> +++ b/include/linux/pci_ids.h
>> @@ -2586,6 +2586,8 @@
>>  #define PCI_VENDOR_ID_TEKRAM		0x1de1
>>  #define PCI_DEVICE_ID_TEKRAM_DC290	0xdc29
>>  
>> +#define PCI_VENDOR_ID_ALIBABA		0x1ded
>> +
>>  #define PCI_VENDOR_ID_TEHUTI		0x1fc9
>>  #define PCI_DEVICE_ID_TEHUTI_3009	0x3009
>>  #define PCI_DEVICE_ID_TEHUTI_3010	0x3010
>> -- 
>> 2.20.1.12.g72788fdb
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-06-06  7:49 [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
                   ` (3 preceding siblings ...)
  2023-06-06  7:49 ` [PATCH v6 4/4] MAINTAINERS: add maintainers for " Shuai Xue
@ 2023-06-16  8:39 ` Shuai Xue
  2023-07-10 12:04   ` Shuai Xue
  4 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-06-16  8:39 UTC (permalink / raw)
  To: chengyou, kaishen, helgaas, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song



On 2023/6/6 15:49, Shuai Xue wrote:
> changes since v5:
> - Rewrite the commit log to follow policy in pci_ids.h (Bjorn Helgaas)
> - return error code when __dwc_pcie_pmu_probe failed (Baolin Wang)
> - call 'cpuhp_remove_multi_state()' when exiting the driver. (Baolin Wang)
> - pick up Review-by tag from Baolin for Patch 1 and 3
> 
> changes since v4:
> 
> 1. addressing commens from Bjorn Helgaas:
> - reorder the includes by alpha
> - change all macros with upper-case hex
> - change ras_des type into u16
> - remove unnecessary outer "()"
> - minor format changes
> 
> 2. Address commensts from Jonathan Cameron:
> - rewrite doc and add a example to show how to use lane event
> 
> 3. fix compile error reported by: kernel test robot
> - remove COMPILE_TEST and add depend on PCI in kconfig
> - add Reported-by: kernel test robot <lkp@intel.com>
> 
> Changes since v3:
> 
> 1. addressing comments from Robin Murphy:
> - add a prepare patch to define pci id in linux/pci_ids.h
> - remove unnecessary 64BIT dependency
> - fix DWC_PCIE_PER_EVENT_OFF/ON macro
> - remove dwc_pcie_pmu struct and move all its fileds into dwc_pcie_rp_info
> - remove unnecessary format field show
> - use sysfs_emit() instead of all the assorted sprintf() and snprintf() calls.
> - remove unnecessary spaces and remove unnecessary cast to follow event show convention
> - remove pcie_pmu_event_attr_is_visible
> - fix a refcout leak on error branch when walk pci device in for_each_pci_dev
> - remove bdf field from dwc_pcie_rp_info and calculate it at runtime
> - finish all the checks before allocating rp_info to avoid hanging wasted memory
> - remove some unused fields
> - warp out control register configuration from sub function to .add()
> - make function return type with a proper signature
> - fix lane event count enable by clear DWC_PCIE_CNT_ENABLE field first
> - pass rp_info directly to the read_*_counter helpers and in start, stop and add callbacks
> - move event type validtion into .event_init()
> - use is_sampling_event() to be consistent with everything else of pmu drivers
> - remove unnecessary dev_err message in .event_init()
> - return EINVAL instead EOPNOTSUPP for not a valid event 
> - finish all the checks before start modifying the event
> - fix sibling event check by comparing event->pmu with sibling->pmu
> - probe PMU for each rootport independently
> - use .update() as .read() directly
> - remove dynamically generating symbolic name of lane event
> - redefine static symbolic name of lane event and leave lane filed to user
> - add CPU hotplug support
> 
> 2. addressing comments from Baolin:
> - add a mask to avoid possible overflow
> 
> Changes since v2 addressing comments from Baolin:
> - remove redundant macro definitions
> - use dev_err to print error message
> - change pmu_is_register to boolean
> - use PLATFORM_DEVID_NONE macro
> - fix module author format
> 
> Changes since v1:
> 
> 1. address comments from Jonathan:
> - drop marco for PMU name and VSEC version
> - simplify code with PCI standard marco
> - simplify code with FIELD_PREP()/FIELD_GET() to replace shift marco
> - name register filed with single _ instead double
> - wrap dwc_pcie_pmu_{write}_dword out and drop meaningless snaity check 
> - check vendor id while matching vesc with pci_find_vsec_capability()
> - remove RP_NUM_MAX and use a list to organize PMU devices for rootports
> - replace DWC_PCIE_CREATE_BDF with standard PCI_DEVID
> - comments on riping register together
> 
> 2. address comments from Bjorn:
> - rename DWC_PCIE_VSEC_ID to DWC_PCIE_VSEC_RAS_DES_ID
> - rename cap_pos to ras_des
> - simplify declare of device_attribute with DEVICE_ATTR_RO
> - simplify code with PCI standard macro and API like pcie_get_width_cap()
> - fix some code style problem and typo
> - drop meaningless snaity check of container_of
> 
> 3. address comments from Yicong:
> - use sysfs_emit() to replace sprintf()
> - simplify iteration of pci device with for_each_pci_dev
> - pick preferred CPUs on a near die and add comments
> - unregister PMU drivers only for failed ones
> - log on behalf PMU device and give more hint
> - fix some code style problem
> 
> (Thanks for all comments and they are very valuable to me)
> 
> This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
> for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
> Core controller IP which provides statistics feature.
> 
> Shuai Xue (4):
>   docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
>   PCI: Add Alibaba Vendor ID to linux/pci_ids.h
>   drivers/perf: add DesignWare PCIe PMU driver
>   MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
> 
>  .../admin-guide/perf/dwc_pcie_pmu.rst         |  97 +++
>  Documentation/admin-guide/perf/index.rst      |   1 +
>  MAINTAINERS                                   |   6 +
>  drivers/infiniband/hw/erdma/erdma_hw.h        |   2 -
>  drivers/perf/Kconfig                          |   7 +
>  drivers/perf/Makefile                         |   1 +
>  drivers/perf/dwc_pcie_pmu.c                   | 706 ++++++++++++++++++
>  include/linux/pci_ids.h                       |   2 +
>  8 files changed, 820 insertions(+), 2 deletions(-)
>  create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
> 

Hi, all,

Gently ping. Any comments are welcomed.

Thank you.

Best Regards,
Shuai



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-06-16  8:39 ` [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
@ 2023-07-10 12:04   ` Shuai Xue
  2023-07-24  2:34     ` Shuai Xue
  0 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-07-10 12:04 UTC (permalink / raw)
  To: chengyou, kaishen, helgaas, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song



On 2023/6/16 16:39, Shuai Xue wrote:
> 
> 
> On 2023/6/6 15:49, Shuai Xue wrote:
>> changes since v5:
>> - Rewrite the commit log to follow policy in pci_ids.h (Bjorn Helgaas)
>> - return error code when __dwc_pcie_pmu_probe failed (Baolin Wang)
>> - call 'cpuhp_remove_multi_state()' when exiting the driver. (Baolin Wang)
>> - pick up Review-by tag from Baolin for Patch 1 and 3
>>
>> changes since v4:
>>
>> 1. addressing commens from Bjorn Helgaas:
>> - reorder the includes by alpha
>> - change all macros with upper-case hex
>> - change ras_des type into u16
>> - remove unnecessary outer "()"
>> - minor format changes
>>
>> 2. Address commensts from Jonathan Cameron:
>> - rewrite doc and add a example to show how to use lane event
>>
>> 3. fix compile error reported by: kernel test robot
>> - remove COMPILE_TEST and add depend on PCI in kconfig
>> - add Reported-by: kernel test robot <lkp@intel.com>
>>
>> Changes since v3:
>>
>> 1. addressing comments from Robin Murphy:
>> - add a prepare patch to define pci id in linux/pci_ids.h
>> - remove unnecessary 64BIT dependency
>> - fix DWC_PCIE_PER_EVENT_OFF/ON macro
>> - remove dwc_pcie_pmu struct and move all its fileds into dwc_pcie_rp_info
>> - remove unnecessary format field show
>> - use sysfs_emit() instead of all the assorted sprintf() and snprintf() calls.
>> - remove unnecessary spaces and remove unnecessary cast to follow event show convention
>> - remove pcie_pmu_event_attr_is_visible
>> - fix a refcout leak on error branch when walk pci device in for_each_pci_dev
>> - remove bdf field from dwc_pcie_rp_info and calculate it at runtime
>> - finish all the checks before allocating rp_info to avoid hanging wasted memory
>> - remove some unused fields
>> - warp out control register configuration from sub function to .add()
>> - make function return type with a proper signature
>> - fix lane event count enable by clear DWC_PCIE_CNT_ENABLE field first
>> - pass rp_info directly to the read_*_counter helpers and in start, stop and add callbacks
>> - move event type validtion into .event_init()
>> - use is_sampling_event() to be consistent with everything else of pmu drivers
>> - remove unnecessary dev_err message in .event_init()
>> - return EINVAL instead EOPNOTSUPP for not a valid event 
>> - finish all the checks before start modifying the event
>> - fix sibling event check by comparing event->pmu with sibling->pmu
>> - probe PMU for each rootport independently
>> - use .update() as .read() directly
>> - remove dynamically generating symbolic name of lane event
>> - redefine static symbolic name of lane event and leave lane filed to user
>> - add CPU hotplug support
>>
>> 2. addressing comments from Baolin:
>> - add a mask to avoid possible overflow
>>
>> Changes since v2 addressing comments from Baolin:
>> - remove redundant macro definitions
>> - use dev_err to print error message
>> - change pmu_is_register to boolean
>> - use PLATFORM_DEVID_NONE macro
>> - fix module author format
>>
>> Changes since v1:
>>
>> 1. address comments from Jonathan:
>> - drop marco for PMU name and VSEC version
>> - simplify code with PCI standard marco
>> - simplify code with FIELD_PREP()/FIELD_GET() to replace shift marco
>> - name register filed with single _ instead double
>> - wrap dwc_pcie_pmu_{write}_dword out and drop meaningless snaity check 
>> - check vendor id while matching vesc with pci_find_vsec_capability()
>> - remove RP_NUM_MAX and use a list to organize PMU devices for rootports
>> - replace DWC_PCIE_CREATE_BDF with standard PCI_DEVID
>> - comments on riping register together
>>
>> 2. address comments from Bjorn:
>> - rename DWC_PCIE_VSEC_ID to DWC_PCIE_VSEC_RAS_DES_ID
>> - rename cap_pos to ras_des
>> - simplify declare of device_attribute with DEVICE_ATTR_RO
>> - simplify code with PCI standard macro and API like pcie_get_width_cap()
>> - fix some code style problem and typo
>> - drop meaningless snaity check of container_of
>>
>> 3. address comments from Yicong:
>> - use sysfs_emit() to replace sprintf()
>> - simplify iteration of pci device with for_each_pci_dev
>> - pick preferred CPUs on a near die and add comments
>> - unregister PMU drivers only for failed ones
>> - log on behalf PMU device and give more hint
>> - fix some code style problem
>>
>> (Thanks for all comments and they are very valuable to me)
>>
>> This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
>> for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
>> Core controller IP which provides statistics feature.
>>
>> Shuai Xue (4):
>>   docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
>>   PCI: Add Alibaba Vendor ID to linux/pci_ids.h
>>   drivers/perf: add DesignWare PCIe PMU driver
>>   MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
>>
>>  .../admin-guide/perf/dwc_pcie_pmu.rst         |  97 +++
>>  Documentation/admin-guide/perf/index.rst      |   1 +
>>  MAINTAINERS                                   |   6 +
>>  drivers/infiniband/hw/erdma/erdma_hw.h        |   2 -
>>  drivers/perf/Kconfig                          |   7 +
>>  drivers/perf/Makefile                         |   1 +
>>  drivers/perf/dwc_pcie_pmu.c                   | 706 ++++++++++++++++++
>>  include/linux/pci_ids.h                       |   2 +
>>  8 files changed, 820 insertions(+), 2 deletions(-)
>>  create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>
> 
> Hi, all,
> 
> Gently ping. Any comments are welcomed.


Hi, all,

Gentle ping.


> 
> Thank you.
>
> 
> Best Regards,
> Shuai
> 
> 

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-07-10 12:04   ` Shuai Xue
@ 2023-07-24  2:34     ` Shuai Xue
  2023-07-24  9:18       ` Jonathan Cameron
  0 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-07-24  2:34 UTC (permalink / raw)
  To: chengyou, kaishen, helgaas, yangyicong, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song



On 2023/7/10 20:04, Shuai Xue wrote:
> 
> 
> On 2023/6/16 16:39, Shuai Xue wrote:
>>
>>
>> On 2023/6/6 15:49, Shuai Xue wrote:
>>> changes since v5:
>>> - Rewrite the commit log to follow policy in pci_ids.h (Bjorn Helgaas)
>>> - return error code when __dwc_pcie_pmu_probe failed (Baolin Wang)
>>> - call 'cpuhp_remove_multi_state()' when exiting the driver. (Baolin Wang)
>>> - pick up Review-by tag from Baolin for Patch 1 and 3
>>>
>>> changes since v4:
>>>
>>> 1. addressing commens from Bjorn Helgaas:
>>> - reorder the includes by alpha
>>> - change all macros with upper-case hex
>>> - change ras_des type into u16
>>> - remove unnecessary outer "()"
>>> - minor format changes
>>>
>>> 2. Address commensts from Jonathan Cameron:
>>> - rewrite doc and add a example to show how to use lane event
>>>
>>> 3. fix compile error reported by: kernel test robot
>>> - remove COMPILE_TEST and add depend on PCI in kconfig
>>> - add Reported-by: kernel test robot <lkp@intel.com>
>>>
>>> Changes since v3:
>>>
>>> 1. addressing comments from Robin Murphy:
>>> - add a prepare patch to define pci id in linux/pci_ids.h
>>> - remove unnecessary 64BIT dependency
>>> - fix DWC_PCIE_PER_EVENT_OFF/ON macro
>>> - remove dwc_pcie_pmu struct and move all its fileds into dwc_pcie_rp_info
>>> - remove unnecessary format field show
>>> - use sysfs_emit() instead of all the assorted sprintf() and snprintf() calls.
>>> - remove unnecessary spaces and remove unnecessary cast to follow event show convention
>>> - remove pcie_pmu_event_attr_is_visible
>>> - fix a refcout leak on error branch when walk pci device in for_each_pci_dev
>>> - remove bdf field from dwc_pcie_rp_info and calculate it at runtime
>>> - finish all the checks before allocating rp_info to avoid hanging wasted memory
>>> - remove some unused fields
>>> - warp out control register configuration from sub function to .add()
>>> - make function return type with a proper signature
>>> - fix lane event count enable by clear DWC_PCIE_CNT_ENABLE field first
>>> - pass rp_info directly to the read_*_counter helpers and in start, stop and add callbacks
>>> - move event type validtion into .event_init()
>>> - use is_sampling_event() to be consistent with everything else of pmu drivers
>>> - remove unnecessary dev_err message in .event_init()
>>> - return EINVAL instead EOPNOTSUPP for not a valid event 
>>> - finish all the checks before start modifying the event
>>> - fix sibling event check by comparing event->pmu with sibling->pmu
>>> - probe PMU for each rootport independently
>>> - use .update() as .read() directly
>>> - remove dynamically generating symbolic name of lane event
>>> - redefine static symbolic name of lane event and leave lane filed to user
>>> - add CPU hotplug support
>>>
>>> 2. addressing comments from Baolin:
>>> - add a mask to avoid possible overflow
>>>
>>> Changes since v2 addressing comments from Baolin:
>>> - remove redundant macro definitions
>>> - use dev_err to print error message
>>> - change pmu_is_register to boolean
>>> - use PLATFORM_DEVID_NONE macro
>>> - fix module author format
>>>
>>> Changes since v1:
>>>
>>> 1. address comments from Jonathan:
>>> - drop marco for PMU name and VSEC version
>>> - simplify code with PCI standard marco
>>> - simplify code with FIELD_PREP()/FIELD_GET() to replace shift marco
>>> - name register filed with single _ instead double
>>> - wrap dwc_pcie_pmu_{write}_dword out and drop meaningless snaity check 
>>> - check vendor id while matching vesc with pci_find_vsec_capability()
>>> - remove RP_NUM_MAX and use a list to organize PMU devices for rootports
>>> - replace DWC_PCIE_CREATE_BDF with standard PCI_DEVID
>>> - comments on riping register together
>>>
>>> 2. address comments from Bjorn:
>>> - rename DWC_PCIE_VSEC_ID to DWC_PCIE_VSEC_RAS_DES_ID
>>> - rename cap_pos to ras_des
>>> - simplify declare of device_attribute with DEVICE_ATTR_RO
>>> - simplify code with PCI standard macro and API like pcie_get_width_cap()
>>> - fix some code style problem and typo
>>> - drop meaningless snaity check of container_of
>>>
>>> 3. address comments from Yicong:
>>> - use sysfs_emit() to replace sprintf()
>>> - simplify iteration of pci device with for_each_pci_dev
>>> - pick preferred CPUs on a near die and add comments
>>> - unregister PMU drivers only for failed ones
>>> - log on behalf PMU device and give more hint
>>> - fix some code style problem
>>>
>>> (Thanks for all comments and they are very valuable to me)
>>>
>>> This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
>>> for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
>>> Core controller IP which provides statistics feature.
>>>
>>> Shuai Xue (4):
>>>   docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
>>>   PCI: Add Alibaba Vendor ID to linux/pci_ids.h
>>>   drivers/perf: add DesignWare PCIe PMU driver
>>>   MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
>>>
>>>  .../admin-guide/perf/dwc_pcie_pmu.rst         |  97 +++
>>>  Documentation/admin-guide/perf/index.rst      |   1 +
>>>  MAINTAINERS                                   |   6 +
>>>  drivers/infiniband/hw/erdma/erdma_hw.h        |   2 -
>>>  drivers/perf/Kconfig                          |   7 +
>>>  drivers/perf/Makefile                         |   1 +
>>>  drivers/perf/dwc_pcie_pmu.c                   | 706 ++++++++++++++++++
>>>  include/linux/pci_ids.h                       |   2 +
>>>  8 files changed, 820 insertions(+), 2 deletions(-)
>>>  create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>
>>
>> Hi, all,
>>
>> Gently ping. Any comments are welcomed.
> 
> 
> Hi, all,
> 
> Gentle ping.
> 

Hi, all

Gentle reminder, thank you.

>>
>> Thank you.
>>
>>
>> Best Regards,
>> Shuai
>>
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-07-24  2:34     ` Shuai Xue
@ 2023-07-24  9:18       ` Jonathan Cameron
  2023-07-24 12:13         ` Shuai Xue
  2023-07-25 20:59         ` Bjorn Helgaas
  0 siblings, 2 replies; 31+ messages in thread
From: Jonathan Cameron @ 2023-07-24  9:18 UTC (permalink / raw)
  To: Shuai Xue
  Cc: chengyou, kaishen, helgaas, yangyicong, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song

On Mon, 24 Jul 2023 10:34:08 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:

> On 2023/7/10 20:04, Shuai Xue wrote:
> > 
> > 
> > On 2023/6/16 16:39, Shuai Xue wrote:  
> >>
> >>
> >> On 2023/6/6 15:49, Shuai Xue wrote:  
> >>> changes since v5:
> >>> - Rewrite the commit log to follow policy in pci_ids.h (Bjorn Helgaas)
> >>> - return error code when __dwc_pcie_pmu_probe failed (Baolin Wang)
> >>> - call 'cpuhp_remove_multi_state()' when exiting the driver. (Baolin Wang)
> >>> - pick up Review-by tag from Baolin for Patch 1 and 3
> >>>
> >>> changes since v4:
> >>>
> >>> 1. addressing commens from Bjorn Helgaas:
> >>> - reorder the includes by alpha
> >>> - change all macros with upper-case hex
> >>> - change ras_des type into u16
> >>> - remove unnecessary outer "()"
> >>> - minor format changes
> >>>
> >>> 2. Address commensts from Jonathan Cameron:
> >>> - rewrite doc and add a example to show how to use lane event
> >>>
> >>> 3. fix compile error reported by: kernel test robot
> >>> - remove COMPILE_TEST and add depend on PCI in kconfig
> >>> - add Reported-by: kernel test robot <lkp@intel.com>
> >>>
> >>> Changes since v3:
> >>>
> >>> 1. addressing comments from Robin Murphy:
> >>> - add a prepare patch to define pci id in linux/pci_ids.h
> >>> - remove unnecessary 64BIT dependency
> >>> - fix DWC_PCIE_PER_EVENT_OFF/ON macro
> >>> - remove dwc_pcie_pmu struct and move all its fileds into dwc_pcie_rp_info
> >>> - remove unnecessary format field show
> >>> - use sysfs_emit() instead of all the assorted sprintf() and snprintf() calls.
> >>> - remove unnecessary spaces and remove unnecessary cast to follow event show convention
> >>> - remove pcie_pmu_event_attr_is_visible
> >>> - fix a refcout leak on error branch when walk pci device in for_each_pci_dev
> >>> - remove bdf field from dwc_pcie_rp_info and calculate it at runtime
> >>> - finish all the checks before allocating rp_info to avoid hanging wasted memory
> >>> - remove some unused fields
> >>> - warp out control register configuration from sub function to .add()
> >>> - make function return type with a proper signature
> >>> - fix lane event count enable by clear DWC_PCIE_CNT_ENABLE field first
> >>> - pass rp_info directly to the read_*_counter helpers and in start, stop and add callbacks
> >>> - move event type validtion into .event_init()
> >>> - use is_sampling_event() to be consistent with everything else of pmu drivers
> >>> - remove unnecessary dev_err message in .event_init()
> >>> - return EINVAL instead EOPNOTSUPP for not a valid event 
> >>> - finish all the checks before start modifying the event
> >>> - fix sibling event check by comparing event->pmu with sibling->pmu
> >>> - probe PMU for each rootport independently
> >>> - use .update() as .read() directly
> >>> - remove dynamically generating symbolic name of lane event
> >>> - redefine static symbolic name of lane event and leave lane filed to user
> >>> - add CPU hotplug support
> >>>
> >>> 2. addressing comments from Baolin:
> >>> - add a mask to avoid possible overflow
> >>>
> >>> Changes since v2 addressing comments from Baolin:
> >>> - remove redundant macro definitions
> >>> - use dev_err to print error message
> >>> - change pmu_is_register to boolean
> >>> - use PLATFORM_DEVID_NONE macro
> >>> - fix module author format
> >>>
> >>> Changes since v1:
> >>>
> >>> 1. address comments from Jonathan:
> >>> - drop marco for PMU name and VSEC version
> >>> - simplify code with PCI standard marco
> >>> - simplify code with FIELD_PREP()/FIELD_GET() to replace shift marco
> >>> - name register filed with single _ instead double
> >>> - wrap dwc_pcie_pmu_{write}_dword out and drop meaningless snaity check 
> >>> - check vendor id while matching vesc with pci_find_vsec_capability()
> >>> - remove RP_NUM_MAX and use a list to organize PMU devices for rootports
> >>> - replace DWC_PCIE_CREATE_BDF with standard PCI_DEVID
> >>> - comments on riping register together
> >>>
> >>> 2. address comments from Bjorn:
> >>> - rename DWC_PCIE_VSEC_ID to DWC_PCIE_VSEC_RAS_DES_ID
> >>> - rename cap_pos to ras_des
> >>> - simplify declare of device_attribute with DEVICE_ATTR_RO
> >>> - simplify code with PCI standard macro and API like pcie_get_width_cap()
> >>> - fix some code style problem and typo
> >>> - drop meaningless snaity check of container_of
> >>>
> >>> 3. address comments from Yicong:
> >>> - use sysfs_emit() to replace sprintf()
> >>> - simplify iteration of pci device with for_each_pci_dev
> >>> - pick preferred CPUs on a near die and add comments
> >>> - unregister PMU drivers only for failed ones
> >>> - log on behalf PMU device and give more hint
> >>> - fix some code style problem
> >>>
> >>> (Thanks for all comments and they are very valuable to me)
> >>>
> >>> This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
> >>> for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
> >>> Core controller IP which provides statistics feature.
> >>>
> >>> Shuai Xue (4):
> >>>   docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
> >>>   PCI: Add Alibaba Vendor ID to linux/pci_ids.h
> >>>   drivers/perf: add DesignWare PCIe PMU driver
> >>>   MAINTAINERS: add maintainers for DesignWare PCIe PMU driver
> >>>
> >>>  .../admin-guide/perf/dwc_pcie_pmu.rst         |  97 +++
> >>>  Documentation/admin-guide/perf/index.rst      |   1 +
> >>>  MAINTAINERS                                   |   6 +
> >>>  drivers/infiniband/hw/erdma/erdma_hw.h        |   2 -
> >>>  drivers/perf/Kconfig                          |   7 +
> >>>  drivers/perf/Makefile                         |   1 +
> >>>  drivers/perf/dwc_pcie_pmu.c                   | 706 ++++++++++++++++++
> >>>  include/linux/pci_ids.h                       |   2 +
> >>>  8 files changed, 820 insertions(+), 2 deletions(-)
> >>>  create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> >>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
> >>>  
> >>
> >> Hi, all,
> >>
> >> Gently ping. Any comments are welcomed.  
> > 
> > 
> > Hi, all,
> > 
> > Gentle ping.
> >   
> 
> Hi, all
> 
> Gentle reminder, thank you.

Hi Shuai,

Really a question for Bjorn I think, but here is my 2 cents...

The problem here is that we need to do that fundamental redesign of the
way the PCI ports drivers work.  I'm not sure there is a path to merging
this until that is done.  The bigger problem is that I'm not sure anyone
is actively looking at that yet.  I'd like to look at this (as I have
the same problem for some other drivers), but it is behind various
other things on my todo list.

Bjorn might be persuaded on a temporary solution, but that would come
with some maintenance problems, particularly when we try to do it
'right' in the future.  Maybe adding another service driver would be
a stop gap as long as we know we won't keep doing so for ever. Not sure.

Jonathan

> 
> >>
> >> Thank you.
> >>
> >>
> >> Best Regards,
> >> Shuai
> >>
> >>  
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-07-24  9:18       ` Jonathan Cameron
@ 2023-07-24 12:13         ` Shuai Xue
  2023-07-25 20:59         ` Bjorn Helgaas
  1 sibling, 0 replies; 31+ messages in thread
From: Shuai Xue @ 2023-07-24 12:13 UTC (permalink / raw)
  To: Jonathan Cameron, Bjorn Helgaas
  Cc: chengyou, kaishen, helgaas, yangyicong, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song



On 2023/7/24 17:18, Jonathan Cameron wrote:
> Really a question for Bjorn I think, but here is my 2 cents...
> 
> The problem here is that we need to do that fundamental redesign of the
> way the PCI ports drivers work.  I'm not sure there is a path to merging
> this until that is done.  The bigger problem is that I'm not sure anyone
> is actively looking at that yet.  I'd like to look at this (as I have
> the same problem for some other drivers), but it is behind various
> other things on my todo list.
> 
> Bjorn might be persuaded on a temporary solution, but that would come
> with some maintenance problems, particularly when we try to do it
> 'right' in the future.  Maybe adding another service driver would be
> a stop gap as long as we know we won't keep doing so for ever. Not sure.

Thank you for your reply, and got your point, :)

+ Bjorn


>>>> The approach used here is to separately walk the PCI topology and
>>>> register the devices.  It can 'maybe' get away with that because no
>>>> interrupts and I assume resets have no nasty impacts on it because
>>>> the device is fairly simple.  In general that's not going to work.
>>>> CXL does a similar trick (which I don't much like, but too late
>>>> now), but we've also run into the problem of how to get interrupts
>>>> if not the main driver.
>>>
>>> Yes, this is a real problem.  I think the "walk all PCI devices
>>> looking for one we like" approach is terrible because it breaks a lot
>>> of driver model assumptions (no device ID to autoload module via udev,
>>> hotplug doesn't work, etc), but we don't have a good alternative right
>>> now.
>>>
>>> I think portdrv is slightly better because at least it claims the
>>> device in the usual way and gives a way for service drivers to
>>> register with it.  But I don't really like that either because it
>>> created a new weird /sys/bus/pci_express hierarchy full of these
>>> sub-devices that aren't really devices, and it doesn't solve the
>>> module load and hotplug issues.
>>>
>>> I would like to have portdrv be completely built into the PCI core and
>>> not claim Root Ports or Switch Ports.  Then those devices would be
>>> available via the usual driver model for driver loading and binding
>>> and for hotplug.
>>
>> Let me see if I understand this correctly as I can think of a few options
>> that perhaps are inline with what you are thinking.
>>
>> 1) All the portdrv stuff converted to normal PCI core helper functions
>>    that a driver bound to the struct pci_dev can use.
>> 2) Driver core itself provides a bunch of extra devices alongside the
>>    struct pci_dev one to which additional drivers can bind? - so kind
>>    of portdrv handling, but squashed into the PCI device topology?
>> 3) Have portdrv operated under the hood, so all the services etc that
>>    it provides don't require a driver to be bound at all.  Then
>>    allow usual VID/DID based driver binding.
>>
>> If 1 - we are going to run into class device restrictions and that will
>> just move where we have to handle the potential vendor specific parts.
>> We probably don't want that to be a hydra with all the functionality
>> and lookups etc driven from there, so do we end up with sub devices
>> of that new PCI port driver with a discover method based on either
>> vsec + VID or DVSEC with devices created under the main pci_dev.
>> That would have to include nastiness around interrupt discovery for
>> those sub devices. So ends up roughly like port_drv.
>>
>> I don't think 2 solves anything.
>>
>> For 3 - interrupts and ownership of facilities is going to be tricky
>> as initially those need to be owned by the PCI core (no device driver bound)
>> and then I guess handed off to the driver once it shows up?  Maybe that
>> driver should call a pci_claim_port() that gives it control of everything
>> and pci_release_port() that hands it all back to the core.  That seems
>> racey.
>
> Yes, 3 is the option I want to explore.  That's what we already do for
> things like ASPM.  Agreed, interrupts is a potential issue.  I think
> the architected parts of config space should be implicitly owned by
> the PCI core, with interfaces à la pci_disable_link_state() if drivers
> need them.
>
> Bjorn
> https://lore.kernel.org/lkml/ZGUAWxoEngmqFcLJ@bhelgaas/

@Bjorn Is there a path to merging this patch set until your explore is done?
And are you still actively looking at that yet?

I am not quite familiar with PCI core, but I would like to help work on that.

Thank you.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-07-24  9:18       ` Jonathan Cameron
  2023-07-24 12:13         ` Shuai Xue
@ 2023-07-25 20:59         ` Bjorn Helgaas
  2023-07-27  3:45           ` Shuai Xue
  1 sibling, 1 reply; 31+ messages in thread
From: Bjorn Helgaas @ 2023-07-25 20:59 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: Shuai Xue, chengyou, kaishen, yangyicong, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song

On Mon, Jul 24, 2023 at 10:18:07AM +0100, Jonathan Cameron wrote:
> On Mon, 24 Jul 2023 10:34:08 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> > On 2023/7/10 20:04, Shuai Xue wrote:
> > > On 2023/6/16 16:39, Shuai Xue wrote:  
> > >> On 2023/6/6 15:49, Shuai Xue wrote:  

> > >>> This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
> > >>> for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
> > >>> Core controller IP which provides statistics feature.

> ...
> Really a question for Bjorn I think, but here is my 2 cents...
> 
> The problem here is that we need to do that fundamental redesign of the
> way the PCI ports drivers work.  I'm not sure there is a path to merging
> this until that is done.  The bigger problem is that I'm not sure anyone
> is actively looking at that yet.  I'd like to look at this (as I have
> the same problem for some other drivers), but it is behind various
> other things on my todo list.
> 
> Bjorn might be persuaded on a temporary solution, but that would come
> with some maintenance problems, particularly when we try to do it
> 'right' in the future.  Maybe adding another service driver would be
> a stop gap as long as we know we won't keep doing so for ever. Not sure.

I think the question here is around the for_each_pci_dev() in
__dwc_pcie_pmu_probe()?  I don't *like* that because of the
assumptions it breaks (autoload doesn't work, hotplug doesn't work),
but:

  - There are several other drivers that also do this,
  - I don't have a better suggest for any of them,
  - It's not a drivers/pci thing, so not really up to me anyway,

so I don't have any problem with this being merged as-is, as long as
you can live with the limitations.

I don't think this series does anything to work around those
limitations, i.e., it doesn't make up fake device IDs for module
loading or fake events for hotplug, so it seems like we could improve
the implementation later if we ever have a way to do it.

Bjorn

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-07-25 20:59         ` Bjorn Helgaas
@ 2023-07-27  3:45           ` Shuai Xue
  2023-07-28 13:39             ` Will Deacon
  0 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-07-27  3:45 UTC (permalink / raw)
  To: Bjorn Helgaas, Jonathan Cameron, Will Deacon
  Cc: chengyou, kaishen, yangyicong, will, baolin.wang, robin.murphy,
	linux-kernel, linux-arm-kernel, linux-pci, rdunlap, mark.rutland,
	zhuo.song



On 2023/7/26 04:59, Bjorn Helgaas wrote:
> On Mon, Jul 24, 2023 at 10:18:07AM +0100, Jonathan Cameron wrote:
>> On Mon, 24 Jul 2023 10:34:08 +0800
>> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>>> On 2023/7/10 20:04, Shuai Xue wrote:
>>>> On 2023/6/16 16:39, Shuai Xue wrote:  
>>>>> On 2023/6/6 15:49, Shuai Xue wrote:  
> 
>>>>>> This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>>>> for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
>>>>>> Core controller IP which provides statistics feature.
> 
>> ...
>> Really a question for Bjorn I think, but here is my 2 cents...
>>
>> The problem here is that we need to do that fundamental redesign of the
>> way the PCI ports drivers work.  I'm not sure there is a path to merging
>> this until that is done.  The bigger problem is that I'm not sure anyone
>> is actively looking at that yet.  I'd like to look at this (as I have
>> the same problem for some other drivers), but it is behind various
>> other things on my todo list.
>>
>> Bjorn might be persuaded on a temporary solution, but that would come
>> with some maintenance problems, particularly when we try to do it
>> 'right' in the future.  Maybe adding another service driver would be
>> a stop gap as long as we know we won't keep doing so for ever. Not sure.
> 
> I think the question here is around the for_each_pci_dev() in
> __dwc_pcie_pmu_probe()?  I don't *like* that because of the
> assumptions it breaks (autoload doesn't work, hotplug doesn't work),
> but:
> 
>   - There are several other drivers that also do this,
>   - I don't have a better suggest for any of them,
>   - It's not a drivers/pci thing, so not really up to me anyway,
> 
> so I don't have any problem with this being merged as-is, as long as
> you can live with the limitations.
> 
> I don't think this series does anything to work around those
> limitations, i.e., it doesn't make up fake device IDs for module
> loading or fake events for hotplug, so it seems like we could improve
> the implementation later if we ever have a way to do it.
> 
> Bjorn

+ Will

Ok, thank you for confirmation, Bjorn. Then it comes to perf driver parts and
it is really a question for @Will I think.

What's your opinion about merging this patch set, @Will?

Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  2023-06-06  7:49 ` [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
@ 2023-07-27  8:57   ` Jonathan Cameron
  2023-07-27 12:52     ` Shuai Xue
  0 siblings, 1 reply; 31+ messages in thread
From: Jonathan Cameron @ 2023-07-27  8:57 UTC (permalink / raw)
  To: Shuai Xue
  Cc: chengyou, kaishen, helgaas, yangyicong, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song

On Tue, 6 Jun 2023 15:49:35 +0800
Shuai Xue <xueshuai@linux.alibaba.com> wrote:

> Alibaba's T-Head Yitan 710 SoC includes Synopsys' DesignWare Core PCIe
> controller which implements which implements PMU for performance and
> functional debugging to facilitate system maintenance.
> 
> Document it to provide guidance on how to use it.
> 
> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

Given this looks like it might move forwards (after Bjorn's reply)
I'll give it a closer review :)

Some editorial things in here only. What you have is easy
to understand but nice to tidy up the odd corner or two.
We can bikeshed this for ever so I've skipped really minor things
where phrasing is debatable (particularly British vs US English :)

Jonathan


> ---
>  .../admin-guide/perf/dwc_pcie_pmu.rst         | 97 +++++++++++++++++++
>  Documentation/admin-guide/perf/index.rst      |  1 +
>  2 files changed, 98 insertions(+)
>  create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> 
> diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> new file mode 100644
> index 000000000000..c1f671cb64ec
> --- /dev/null
> +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
> @@ -0,0 +1,97 @@
> +======================================================================
> +Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
> +======================================================================
> +
> +DesignWare Cores (DWC) PCIe PMU
> +===============================
> +
> +The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
> +only PCIe configuration space register block provided by each PCIe Root

I don't think you need the negative bit of description - it's not a lot of
different things and this statement only really makes sense when compared to
some other PCIe PMUs which the reader may never have come across.

"The PMU is a PCIe configuration space register block provided by each PCIE Root
Port in a Vendor-Specific Extended Capability ..."

> +Port in a Vendor-Specific Extended Capability named RAS DES (Debug, Error
> +injection, and Statistics).
> +
> +As the name indicated, the RAS DES capability supports system level

"As the name indicates," (present tense more appropriate here)

> +debugging, AER error injection, and collection of statistics. To facilitate
> +collection of statistics, Synopsys DesignWare Cores PCIe controller

"Core's"

(as it belongs to the core rather than intent being that it applies to plural
cores?)

> +provides the following two features:
> +
> +- Time Based Analysis (RX/TX data throughput and time spent in each
> +  low-power LTSSM state)
> +- Lane Event counters (Error and Non-Error for lanes)
> +
> +Time Based Analysis
> +-------------------
> +
> +Using this feature you can obtain information regarding RX/TX data
> +throughput and time spent in each low-power LTSSM state by the controller.
> +
> +The counters are 64-bit width and measure data in two categories,
> +
> +- percentage of time does the controller stay in LTSSM state in a

"percentage of time the controller stays in LTSSM " 

> +  configurable duration. The measurement range of each Event in Group#0.

I'm not sure of meaning of the last sentence.  Is it simply that this bullet
refers to group#0?  Perhaps make that the lead off. e.g.

- Group#0: Percentage of time the controller stays in LTSSM states.
- Group#1: Amount of data processed (Units of 16 bytes).

> +- amount of data processed (Units of 16 bytes). The measurement range of
> +  each Event in Group#1.
> +
> +Lane Event counters
> +-------------------
> +
> +Using this feature you can obtain Error and Non-Error information in
> +specific lane by the controller.
> +
> +The counters are 32-bit width and the measured event is select by:
> +
> +- Group i
> +- Event j within the Group i
> +- and Lane k
> +
> +Some of the event counters only exist for specific configurations.
> +
> +DesignWare Cores (DWC) PCIe PMU Driver
> +=======================================
> +
> +This driver add PMU devices for each PCIe Root Port. And the PMU device is

"This driver adds PMU devices for each PCIe Root Port.  The PMU device is named"

(Not good to start a sentence with And - an alternative form would be)

"This driver adds PMU devices for each PCIe Root Port named based on the BDF of
the Root Port." 

> +named based the BDF of Root Port. For example,
> +
> +    30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
> +
> +the PMU device name for this Root Port is dwc_rootport_3018.
> +
> +The DWC PCIe PMU driver registers a perf PMU driver, which provides
> +description of available events and configuration options in sysfs, see
> +/sys/bus/event_source/devices/dwc_rootport_{bdf}.
> +
> +The "format" directory describes format of the config, fields of the

"config fields" (stray comma makes this confusing to read)

> +perf_event_attr structure. The "events" directory provides configuration
> +templates for all documented events.  For example,
> +"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
> +
> +The "perf list" command shall list the available events from sysfs, e.g.::
> +
> +    $# perf list | grep dwc_rootport
> +    <...>
> +    dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/        [Kernel PMU event]
> +    <...>
> +    dwc_rootport_3018/rx_memory_read,lane=?/               [Kernel PMU event]
> +
> +Time Based Analysis Event Usage
> +-------------------------------
> +
> +Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
> +
> +    $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
> +
> +The average RX/TX bandwidth can be calculated using the following formula:
> +
> +    PCIe RX Bandwidth = PCIE_RX_DATA * 16B / Measure_Time_Window
> +    PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
> +
> +Lane Event Usage
> +-------------------------------
> +
> +Each lane has the same event set and to avoid generating a list of hundreds
> +of events, the user need to specify the lane ID explicitly, e.g.::
> +
> +    $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/
> +
> +The driver does not support sampling, therefore "perf record" will not
> +work. Per-task (without "-a") perf sessions are not supported.
> diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
> index 9de64a40adab..11a80cd28a2e 100644
> --- a/Documentation/admin-guide/perf/index.rst
> +++ b/Documentation/admin-guide/perf/index.rst
> @@ -19,5 +19,6 @@ Performance monitor support
>     arm_dsu_pmu
>     thunderx2-pmu
>     alibaba_pmu
> +   dwc_pcie_pmu
>     nvidia-pmu
>     meson-ddr-pmu


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-06-06 15:14   ` Yicong Yang
@ 2023-07-27  9:39     ` Jonathan Cameron
  2023-07-28 12:41       ` Shuai Xue
  2023-07-28  1:31     ` Shuai Xue
  1 sibling, 1 reply; 31+ messages in thread
From: Jonathan Cameron @ 2023-07-27  9:39 UTC (permalink / raw)
  To: Yicong Yang
  Cc: Shuai Xue, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, yangyicong, linux-kernel, linux-arm-kernel,
	linux-pci, rdunlap, mark.rutland, zhuo.song

On Tue, 6 Jun 2023 23:14:07 +0800
Yicong Yang <yangyicong@huawei.com> wrote:

> On 2023/6/6 15:49, Shuai Xue wrote:
> > This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
> > for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
> > Core controller IP which provides statistics feature. The PMU is not a PCIe
> > Root Complex integrated End Point(RCiEP) device but only register counters
> > provided by each PCIe Root Port.
> > 
> > To facilitate collection of statistics the controller provides the
> > following two features for each Root Port:
> > 
> > - Time Based Analysis (RX/TX data throughput and time spent in each
> >   low-power LTSSM state)
> > - Event counters (Error and Non-Error for lanes)
> > 
> > Note, only one counter for each type and does not overflow interrupt.
> > 
> > This driver adds PMU devices for each PCIe Root Port. And the PMU device is
> > named based the BDF of Root Port. For example,
> > 
> >     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
> > 
> > the PMU device name for this Root Port is dwc_rootport_3018.
> > 
> > Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
> > 
> >     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
> > 
> > average RX bandwidth can be calculated like this:
> > 
> >     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
> > 
> > Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
> > Reported-by: kernel test robot <lkp@intel.com>
> > Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
> > Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>

I'll review on top to avoid any duplication with Yicong.

Note I've cropped the stuff neither of us commented on so it's
easier to spot the feedback.

Jonathan

> > ---
> >  drivers/perf/Kconfig        |   7 +
> >  drivers/perf/Makefile       |   1 +
> >  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
> >  3 files changed, 714 insertions(+)
> >  create mode 100644 drivers/perf/dwc_pcie_pmu.c
> > 
> > diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
> > index 711f82400086..6ff3921d7a62 100644
> > --- a/drivers/perf/Kconfig
> > +++ b/drivers/perf/Kconfig
> > @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
> >  	  Enable perf support for Marvell DDR Performance monitoring
> >  	  event on CN10K platform.
> >  
> > +config DWC_PCIE_PMU
> > +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
> > +	depends on (ARM64 && PCI)
> > +	help
> > +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
> > +	  monitoring event on Yitian 710 platform.

The documentation kind of implies this isn't platform specific.
If some parts are (such as which events exist) then you may want to push
that to userspace / perftool with appropriate matching against specific SoC.

If it is generic, then change this text to "event on platform including the Yitian 710."

> > +
> >  source "drivers/perf/arm_cspmu/Kconfig"
> >  
> >  source "drivers/perf/amlogic/Kconfig"

> > new file mode 100644
> > index 000000000000..8bfcf6e0662d
> > --- /dev/null
> > +++ b/drivers/perf/dwc_pcie_pmu.c
> > @@ -0,0 +1,706 @@

...

> > +
> > +struct dwc_pcie_pmu {
> > +	struct pci_dev		*pdev;		/* Root Port device */  
> 
> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
> pointer. I didn't see you hold the root port to avoid the removal.
> 
> > +	u16			ras_des;	/* RAS DES capability offset */
> > +	u32			nr_lanes;
> > +
> > +	struct list_head	pmu_node;
> > +	struct hlist_node	cpuhp_node;
> > +	struct pmu		pmu;
> > +	struct perf_event	*event;
> > +	int			oncpu;
> > +};
> > +
> > +struct dwc_pcie_pmu_priv {
> > +	struct device *dev;
> > +	struct list_head pmu_nodes;
> > +};
> > +
> > +#define to_dwc_pcie_pmu(p) (container_of(p, struct dwc_pcie_pmu, pmu))
> > +  
> 
> somebody told me to put @pmu as the first member then this macro will have no calculation. :)
> 

...

> > +static ssize_t dwc_pcie_event_show(struct device *dev,
> > +				struct device_attribute *attr, char *buf)
> > +{
> > +	struct dwc_pcie_event_attr *eattr;
> > +
> > +	eattr = container_of(attr, typeof(*eattr), attr);
> > +
> > +	if (eattr->type == DWC_PCIE_LANE_EVENT)
> > +		return sysfs_emit(buf, "eventid=0x%x,type=0x%x,lane=?\n",
> > +				  eattr->eventid, eattr->type);
> > +

Elsewhere you always check for DWC_PCIE_TIME_BASE_EVENT.
Should probably do so here as well for consistency.

> > +	return sysfs_emit(buf, "eventid=0x%x,type=0x%x\n", eattr->eventid,
> > +		       eattr->type);
> > +}

> > +static struct attribute *dwc_pcie_pmu_time_event_attrs[] = {
> > +	/* Group #0 */
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(one_cycle, 0x00),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_L0S, 0x01),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(RX_L0S, 0x02),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L0, 0x03),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1, 0x04),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_1, 0x05),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_2, 0x06),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(CFG_RCVRY, 0x07),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_RX_L0S, 0x08),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_AUX, 0x09),
> > +
> > +	/* Group #1 */
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_PCIe_TLP_Data_Payload, 0x20),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_PCIe_TLP_Data_Payload, 0x21),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_CCIX_TLP_Data_Payload, 0x22),
> > +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_CCIX_TLP_Data_Payload, 0x23),
> > +
> > +	/*
> > +	 * Leave it to the user to specify the lane ID to avoid generating
> > +	 * a list of hundreds of events.
> > +	 */
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ack_dllp, 0x600),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_update_fc_dllp, 0x601),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ack_dllp, 0x602),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_update_fc_dllp, 0x603),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_nulified_tlp, 0x604),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_nulified_tlp, 0x605),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_duplicate_tl, 0x606),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_write, 0x700),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_read, 0x701),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_write, 0x702),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_read, 0x703),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_write, 0x704),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_read, 0x705),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_without_data, 0x706),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_with_data, 0x707),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_message_tlp, 0x708),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_atomic, 0x709),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_tlp_with_prefix, 0x70A),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_write, 0x70B),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_read, 0x70C),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_write, 0x70F),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_read, 0x710),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_without_data, 0x711),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_with_data, 0x712),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_message_tlp, 0x713),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_atomic, 0x714),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_tlp_with_prefix, 0x715),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ccix_tlp, 0x716),
> > +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ccix_tlp, 0x717),
> > +  
> 
> Intended blank line?
> 
> > +	NULL
> > +};


...

> > +static u64 dwc_pcie_pmu_read_time_based_counter(struct dwc_pcie_pmu *pcie_pmu)
> > +{
> > +	struct pci_dev *pdev = pcie_pmu->pdev;
> > +	u16 ras_des = pcie_pmu->ras_des;
> > +	u64 count;
> > +	u32 val;
> > +
> > +	pci_read_config_dword(
> > +		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH, &val);
> > +	count = val;
> > +	count <<= 32;
> > +
> > +	pci_read_config_dword(
> > +		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW, &val);

This looks like tearing can occur.  you probably need to protect against that
(usual trick is re read the _HIGH part and if it changed, try again)

The hardware might prevent tearing (it would freeze the low register when you
read the high one, then only let it change after a read of the low registers is
done).  If that's the case - add a comment to say so.

> > +
> > +	count += val;
> > +
> > +	return count;
> > +}
> > +


...
> > +static int dwc_pcie_pmu_event_add(struct perf_event *event, int flags)
> > +{
> > +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
> > +	struct pci_dev *pdev = pcie_pmu->pdev;
> > +	struct hw_perf_event *hwc = &event->hw;
> > +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
> > +	int event_id = DWC_PCIE_EVENT_ID(event);
> > +	int lane = DWC_PCIE_EVENT_LANE(event);
> > +	u16 ras_des = pcie_pmu->ras_des;
> > +	u32 ctrl;
> > +
> > +	/* Only one counter and it is in use */

Yikes. That's quite a restriction.  Probably good to mention in the docs.
I'm a little confused about the architecture though - there seem to be separate
registers for the Lane and time based events.  Can't count those at same time?

> > +	if (pcie_pmu->event)
> > +		return -ENOSPC;
> > +
> > +	pcie_pmu->event = event;
> > +	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
> > +
> > +	if (type == DWC_PCIE_LANE_EVENT) {
> > +		/* EVENT_COUNTER_DATA_REG needs clear manually */
> > +		ctrl = FIELD_PREP(DWC_PCIE_CNT_EVENT_SEL, event_id) |
> > +			FIELD_PREP(DWC_PCIE_CNT_LANE_SEL, lane) |
> > +			FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF) |
> > +			FIELD_PREP(DWC_PCIE_EVENT_CLEAR, DWC_PCIE_EVENT_PER_CLEAR);
> > +		pci_write_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL,
> > +				       ctrl);
> > +	} else if (type == DWC_PCIE_TIME_BASE_EVENT) {
> > +		/*
> > +		 * TIME_BASED_ANAL_DATA_REG is a 64 bit register, we can safely
> > +		 * use it with any manually controlled duration. And it is
> > +		 * cleared when next measurement starts.
> > +		 */
> > +		ctrl = FIELD_PREP(DWC_PCIE_TIME_BASED_REPORT_SEL, event_id) |
> > +			FIELD_PREP(DWC_PCIE_TIME_BASED_DURATION_SEL,
> > +				   DWC_PCIE_DURATION_MANUAL_CTL) |
> > +			DWC_PCIE_TIME_BASED_CNT_ENABLE;
> > +		pci_write_config_dword(
> > +			pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL, ctrl);
> > +	}
> > +
> > +	if (flags & PERF_EF_START)
> > +		dwc_pcie_pmu_event_start(event, PERF_EF_RELOAD);
> > +
> > +	perf_event_update_userpage(event);
> > +
> > +	return 0;
> > +}
...

> > +static int __dwc_pcie_pmu_probe(struct dwc_pcie_pmu_priv *priv)
> > +{
> > +	struct pci_dev *pdev = NULL;
> > +	struct dwc_pcie_pmu *pcie_pmu;
> > +	char *name;
> > +	u32 bdf;
> > +	int ret;
> > +
> > +	INIT_LIST_HEAD(&priv->pmu_nodes);
> > +
> > +	/* Match the rootport with VSEC_RAS_DES_ID, and register a PMU for it */
> > +	for_each_pci_dev(pdev) {
> > +		u16 vsec;
> > +		u32 val;
> > +
> > +		if (!(pci_is_pcie(pdev) &&
> > +		      pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT))
> > +			continue;
> > +
> > +		vsec = pci_find_vsec_capability(pdev, PCI_VENDOR_ID_ALIBABA,
> > +						DWC_PCIE_VSEC_RAS_DES_ID);
> > +		if (!vsec)
> > +			continue;
> > +
> > +		pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val);
> > +		if (PCI_VNDR_HEADER_REV(val) != 0x04 ||
> > +		    PCI_VNDR_HEADER_LEN(val) != 0x100)
> > +			continue;
> > +		pci_dbg(pdev,
> > +			"Detected PCIe Vendor-Specific Extended Capability RAS DES\n");
> > +
> > +		bdf = PCI_DEVID(pdev->bus->number, pdev->devfn);
> > +		name = devm_kasprintf(priv->dev, GFP_KERNEL, "dwc_rootport_%x",
> > +				      bdf);
> > +		if (!name)
> > +			return -ENOMEM;
> > +
> > +		/* All checks passed, go go go */
> > +		pcie_pmu = devm_kzalloc(&pdev->dev, sizeof(*pcie_pmu), GFP_KERNEL);
> > +		if (!pcie_pmu) {
> > +			pci_dev_put(pdev);  
> 
> we need to call pci_dev_put on all the return branch below and above and after the for_each_pci_dev()
> loop to keep the refcnt balance.

Good spot. I'd use a goto for this given there are lots of places.

> 
> > +			return -ENOMEM;
> > +		}
> > +
> > +		pcie_pmu->pdev = pdev;
> > +		pcie_pmu->ras_des = vsec;
> > +		pcie_pmu->nr_lanes = pcie_get_width_cap(pdev);
> > +		pcie_pmu->pmu = (struct pmu){
> > +			.module		= THIS_MODULE,
> > +			.attr_groups	= dwc_pcie_attr_groups,
> > +			.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
> > +			.task_ctx_nr	= perf_invalid_context,
> > +			.event_init	= dwc_pcie_pmu_event_init,
> > +			.add		= dwc_pcie_pmu_event_add,
> > +			.del		= dwc_pcie_pmu_event_del,
> > +			.start		= dwc_pcie_pmu_event_start,
> > +			.stop		= dwc_pcie_pmu_event_stop,
> > +			.read		= dwc_pcie_pmu_event_update,
> > +		};
> > +
> > +		/* Add this instance to the list used by the offline callback */
> > +		ret = cpuhp_state_add_instance(dwc_pcie_pmu_hp_state,
> > +					       &pcie_pmu->cpuhp_node);
> > +		if (ret) {
> > +			pci_err(pcie_pmu->pdev,
> > +				"Error %d registering hotplug @%x\n", ret, bdf);
> > +			return ret;
> > +		}

Here you mix non devm_ handling in mid way through a series of devm_ calls.
Whilst I 'think' what you have here is fine, I prefer to minimize thinking
whilst reviewing and using devm_add_action_or_reset() with callbacks
in appropriate places would ensure automatic unwinding in the error
path deals with everything in the reverse order of setup.

You just need two instances - one to unwind the cpuhp_state_add_instance() and
one to unwind the perf_pmu_register()
 
> > +		ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
> > +		if (ret) {
> > +			pci_err(pcie_pmu->pdev,
> > +				"Error %d registering PMU @%x\n", ret, bdf);
> > +			cpuhp_state_remove_instance_nocalls(
> > +				dwc_pcie_pmu_hp_state, &pcie_pmu->cpuhp_node);
> > +			return ret;
> > +		}
> > +
> > +		/* Add registered PMUs and unregister them when this driver remove */
> > +		list_add(&pcie_pmu->pmu_node, &priv->pmu_nodes);

This handling would be replaced by the tracking devm is doing for us. So I think
there will be no need for the list.


> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int dwc_pcie_pmu_remove(struct platform_device *pdev)
> > +{
> > +	struct dwc_pcie_pmu_priv *priv = platform_get_drvdata(pdev);
> > +	struct dwc_pcie_pmu *pcie_pmu;
> > +
> > +	list_for_each_entry(pcie_pmu, &priv->pmu_nodes, pmu_node) {
> > +		cpuhp_state_remove_instance(dwc_pcie_pmu_hp_state,
> > +					    &pcie_pmu->cpuhp_node);
> > +		perf_pmu_unregister(&pcie_pmu->pmu);  
> 
> should unregister the PMU first, keep the order reverse to __dwc_pcie_pmu_probe().
These two could have been handled via appropriate devm_add_action_or_reset()
above and let that infrastructure unwind for us in the error path.

If anyone fixes the whole pmu drivers aren't removable mess, then we will
also end up with remove handling for free :)

> 
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static int dwc_pcie_pmu_probe(struct platform_device *pdev)
> > +{
> > +	struct dwc_pcie_pmu_priv *priv;
> > +	int ret;
> > +
> > +	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
> > +	if (!priv)
> > +		return -ENOMEM;
> > +
> > +	priv->dev = &pdev->dev;
> > +	platform_set_drvdata(pdev, priv);
> > +
> > +	/* If one PMU registration fails, remove all. */
> > +	ret = __dwc_pcie_pmu_probe(priv);
> > +	if (ret) {
> > +		dwc_pcie_pmu_remove(pdev);

There is a bit of mixing of devm and not here which makes things somewhat
hard to reason about.  Perhaps take the whole unwind flow over to devm managed.
See above.

> > +		return ret;
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void dwc_pcie_pmu_migrate(struct dwc_pcie_pmu *pcie_pmu, unsigned int cpu)
> > +{
> > +	/* This PMU does NOT support interrupt, just migrate context. */
> > +	perf_pmu_migrate_context(&pcie_pmu->pmu, pcie_pmu->oncpu, cpu);
> > +	pcie_pmu->oncpu = cpu;
> > +}
> > +
> > +static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
> > +{
> > +	struct dwc_pcie_pmu *pcie_pmu;
> > +	struct pci_dev *pdev;
> > +	int node;
> > +
> > +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
> > +	pdev = pcie_pmu->pdev;
> > +	node = dev_to_node(&pdev->dev);
> > +
> > +	if (node != NUMA_NO_NODE && cpu_to_node(pcie_pmu->oncpu) != node &&

Perhaps worth a comment on when you might see node == NUMA_NO_NODE?
Beyond NUMA being entirely disabled, I'd hope that never happens and for that you
might be able to use a compile time check.

I wonder if this can be simplified by a flag that says if we are already in the
right node? Might be easier to follow than having similar dance in online and offline
to figure that out.


> > +	    cpu_to_node(cpu) == node)
> > +		dwc_pcie_pmu_migrate(pcie_pmu, cpu);
> > +
> > +	return 0;
> > +}
> > +
> > +static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
> > +{
> > +	struct dwc_pcie_pmu *pcie_pmu;
> > +	struct pci_dev *pdev;
> > +	int node;
> > +	cpumask_t mask;
> > +	unsigned int target;
> > +
> > +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
> > +	if (cpu != pcie_pmu->oncpu)
> > +		return 0;
> > +
> > +	pdev = pcie_pmu->pdev;
> > +	node = dev_to_node(&pdev->dev);
> > +	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
> > +	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
> > +		target = cpumask_any(&mask);  
> 
> The cpumask_of_node() only contains the online CPUs so this branch is redundant. For arm64
> using arch_numa.c the node cpumask is updated in numa_{add, remove}_cpu() and for other
> arthitecture the behaviour should keep consistenct. Please correct my if I'm wrong.
> 
> > +	else
> > +		target = cpumask_any_but(cpu_online_mask, cpu);

If following above suggestion, would set flag to say in wrong node here - and wherever
you end up in a node to start with...


> > +	if (target < nr_cpu_ids)
> > +		dwc_pcie_pmu_migrate(pcie_pmu, target);
> > +
> > +	return 0;
> > +}
> > +
> > +static struct platform_driver dwc_pcie_pmu_driver = {
> > +	.probe = dwc_pcie_pmu_probe,
> > +	.remove = dwc_pcie_pmu_remove,
> > +	.driver = {.name = "dwc_pcie_pmu",},
> > +};
> > +
> > +static int __init dwc_pcie_pmu_init(void)
> > +{
> > +	int ret;
> > +
> > +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> > +				      "perf/dwc_pcie_pmu:online",
> > +				      dwc_pcie_pmu_online_cpu,
> > +				      dwc_pcie_pmu_offline_cpu);
> > +	if (ret < 0)
> > +		return ret;
> > +
> > +	dwc_pcie_pmu_hp_state = ret;
> > +
> > +	ret = platform_driver_register(&dwc_pcie_pmu_driver);
> > +	if (ret) {
> > +		cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
> > +		return ret;
> > +	}
> > +
> > +	dwc_pcie_pmu_dev = platform_device_register_simple(
> > +				"dwc_pcie_pmu", PLATFORM_DEVID_NONE, NULL, 0);
> > +	if (IS_ERR(dwc_pcie_pmu_dev)) {
> > +		platform_driver_unregister(&dwc_pcie_pmu_driver);  
> 
> On failure we also need to remove cpuhp state as well.

I'd suggest using gotos and a single error handling block. That
makes it both harder to forget things like this and easier to
compare that block with what happens in exit() - so slightly 
easier to review!

> 
> Thanks,
> Yicong
> 
> > +		return PTR_ERR(dwc_pcie_pmu_dev);
> > +	}
> > +
> > +	return 0;
> > +}
> > +
> > +static void __exit dwc_pcie_pmu_exit(void)
> > +{
> > +	platform_device_unregister(dwc_pcie_pmu_dev);
> > +	platform_driver_unregister(&dwc_pcie_pmu_driver);
> > +	cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
> > +}
> > +
> > +module_init(dwc_pcie_pmu_init);
> > +module_exit(dwc_pcie_pmu_exit);
> > +
> > +MODULE_DESCRIPTION("PMU driver for DesignWare Cores PCI Express Controller");
> > +MODULE_AUTHOR("Shuai xue <xueshuai@linux.alibaba.com>");
> > +MODULE_AUTHOR("Wen Cheng <yinxuan_cw@linux.alibaba.com>");
> > +MODULE_LICENSE("GPL v2");
> >   

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  2023-07-27  8:57   ` Jonathan Cameron
@ 2023-07-27 12:52     ` Shuai Xue
  2023-07-28 10:18       ` Jonathan Cameron
  0 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-07-27 12:52 UTC (permalink / raw)
  To: Jonathan Cameron
  Cc: chengyou, kaishen, helgaas, yangyicong, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song



On 2023/7/27 16:57, Jonathan Cameron wrote:
> On Tue, 6 Jun 2023 15:49:35 +0800
> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> 
>> Alibaba's T-Head Yitan 710 SoC includes Synopsys' DesignWare Core PCIe
>> controller which implements which implements PMU for performance and
>> functional debugging to facilitate system maintenance.
>>
>> Document it to provide guidance on how to use it.
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> 
> Given this looks like it might move forwards (after Bjorn's reply)
> I'll give it a closer review :)

That's great to hear! I appreciate the effort that has been put into resuming the
review process. Thank you for your dedication and hard work in making this happen。

> 
> Some editorial things in here only. What you have is easy
> to understand but nice to tidy up the odd corner or two.
> We can bikeshed this for ever so I've skipped really minor things
> where phrasing is debatable (particularly British vs US English :)

Thank you for patiently pointing out the writing issues. I appreciate your feedback
and it will make the necessary improvements.

(Comments replied inline)

Best Regards,
Shuai

> 
> Jonathan
> 
> 
>> ---
>>  .../admin-guide/perf/dwc_pcie_pmu.rst         | 97 +++++++++++++++++++
>>  Documentation/admin-guide/perf/index.rst      |  1 +
>>  2 files changed, 98 insertions(+)
>>  create mode 100644 Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>>
>> diff --git a/Documentation/admin-guide/perf/dwc_pcie_pmu.rst b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>> new file mode 100644
>> index 000000000000..c1f671cb64ec
>> --- /dev/null
>> +++ b/Documentation/admin-guide/perf/dwc_pcie_pmu.rst
>> @@ -0,0 +1,97 @@
>> +======================================================================
>> +Synopsys DesignWare Cores (DWC) PCIe Performance Monitoring Unit (PMU)
>> +======================================================================
>> +
>> +DesignWare Cores (DWC) PCIe PMU
>> +===============================
>> +
>> +The PMU is not a PCIe Root Complex integrated End Point (RCiEP) device but
>> +only PCIe configuration space register block provided by each PCIe Root
> 
> I don't think you need the negative bit of description - it's not a lot of
> different things and this statement only really makes sense when compared to
> some other PCIe PMUs which the reader may never have come across.
> 
> "The PMU is a PCIe configuration space register block provided by each PCIE Root
> Port in a Vendor-Specific Extended Capability ..."

Aha, you are right, I should not have made such assumptions, will adopt your
rewriting.

> 
>> +Port in a Vendor-Specific Extended Capability named RAS DES (Debug, Error
>> +injection, and Statistics).
>> +
>> +As the name indicated, the RAS DES capability supports system level
> 
> "As the name indicates," (present tense more appropriate here)

Will fix it.

> 
>> +debugging, AER error injection, and collection of statistics. To facilitate
>> +collection of statistics, Synopsys DesignWare Cores PCIe controller
> 
> "Core's"
> 
> (as it belongs to the core rather than intent being that it applies to plural
> cores?)

"Synopsys DesignWare Cores PCIe controller" is from the title from Synopsys
databook, so I prefer to keep as it is here.


> 
>> +provides the following two features:
>> +
>> +- Time Based Analysis (RX/TX data throughput and time spent in each
>> +  low-power LTSSM state)
>> +- Lane Event counters (Error and Non-Error for lanes)
>> +
>> +Time Based Analysis
>> +-------------------
>> +
>> +Using this feature you can obtain information regarding RX/TX data
>> +throughput and time spent in each low-power LTSSM state by the controller.
>> +
>> +The counters are 64-bit width and measure data in two categories,
>> +
>> +- percentage of time does the controller stay in LTSSM state in a
> 
> "percentage of time the controller stays in LTSSM " 

Will fix it.

> 
>> +  configurable duration. The measurement range of each Event in Group#0.
> 
> I'm not sure of meaning of the last sentence.  Is it simply that this bullet
> refers to group#0?  Perhaps make that the lead off. e.g.
> 
> - Group#0: Percentage of time the controller stays in LTSSM states.
> - Group#1: Amount of data processed (Units of 16 bytes).

You are right. Will fix it.

> 
>> +- amount of data processed (Units of 16 bytes). The measurement range of
>> +  each Event in Group#1.
>> +
>> +Lane Event counters
>> +-------------------
>> +
>> +Using this feature you can obtain Error and Non-Error information in
>> +specific lane by the controller.
>> +
>> +The counters are 32-bit width and the measured event is select by:
>> +
>> +- Group i
>> +- Event j within the Group i
>> +- and Lane k
>> +
>> +Some of the event counters only exist for specific configurations.
>> +
>> +DesignWare Cores (DWC) PCIe PMU Driver
>> +=======================================
>> +
>> +This driver add PMU devices for each PCIe Root Port. And the PMU device is
> 
> "This driver adds PMU devices for each PCIe Root Port.  The PMU device is named"
> 
> (Not good to start a sentence with And - an alternative form would be)
> 
> "This driver adds PMU devices for each PCIe Root Port named based on the BDF of
> the Root Port." 

Ok, will fix it.

> 
>> +named based the BDF of Root Port. For example,
>> +
>> +    30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>> +
>> +the PMU device name for this Root Port is dwc_rootport_3018.
>> +
>> +The DWC PCIe PMU driver registers a perf PMU driver, which provides
>> +description of available events and configuration options in sysfs, see
>> +/sys/bus/event_source/devices/dwc_rootport_{bdf}.
>> +
>> +The "format" directory describes format of the config, fields of the
> 
> "config fields" (stray comma makes this confusing to read)

Will fix it.



>> +perf_event_attr structure. The "events" directory provides configuration
>> +templates for all documented events.  For example,
>> +"Rx_PCIe_TLP_Data_Payload" is an equivalent of "eventid=0x22,type=0x1".
>> +
>> +The "perf list" command shall list the available events from sysfs, e.g.::
>> +
>> +    $# perf list | grep dwc_rootport
>> +    <...>
>> +    dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/        [Kernel PMU event]
>> +    <...>
>> +    dwc_rootport_3018/rx_memory_read,lane=?/               [Kernel PMU event]
>> +
>> +Time Based Analysis Event Usage
>> +-------------------------------
>> +
>> +Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>> +
>> +    $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>> +
>> +The average RX/TX bandwidth can be calculated using the following formula:
>> +
>> +    PCIe RX Bandwidth = PCIE_RX_DATA * 16B / Measure_Time_Window
>> +    PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>> +
>> +Lane Event Usage
>> +-------------------------------
>> +
>> +Each lane has the same event set and to avoid generating a list of hundreds
>> +of events, the user need to specify the lane ID explicitly, e.g.::
>> +
>> +    $# perf stat -a -e dwc_rootport_3018/rx_memory_read,lane=4/
>> +
>> +The driver does not support sampling, therefore "perf record" will not
>> +work. Per-task (without "-a") perf sessions are not supported.
>> diff --git a/Documentation/admin-guide/perf/index.rst b/Documentation/admin-guide/perf/index.rst
>> index 9de64a40adab..11a80cd28a2e 100644
>> --- a/Documentation/admin-guide/perf/index.rst
>> +++ b/Documentation/admin-guide/perf/index.rst
>> @@ -19,5 +19,6 @@ Performance monitor support
>>     arm_dsu_pmu
>>     thunderx2-pmu
>>     alibaba_pmu
>> +   dwc_pcie_pmu
>>     nvidia-pmu
>>     meson-ddr-pmu

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-06-06 15:14   ` Yicong Yang
  2023-07-27  9:39     ` Jonathan Cameron
@ 2023-07-28  1:31     ` Shuai Xue
  1 sibling, 0 replies; 31+ messages in thread
From: Shuai Xue @ 2023-07-28  1:31 UTC (permalink / raw)
  To: Yicong Yang, chengyou, kaishen, helgaas, will, Jonathan.Cameron,
	baolin.wang, robin.murphy
  Cc: yangyicong, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song



On 2023/6/6 23:14, Yicong Yang wrote:

Hi, Yicong,

Thank you for your valuable comments, and I apologize for missing your previous
message. It appears that Thunderbird had mistakenly placed your email in the
junk folder, causing me to overlook it.

Jonathan's reply served as a reminder, prompting me to realize that I had missed
some emails. Since Jonathan's reply is on top of yours, I will give my feedback
on both of your messages in his thread.

Thank you.

Best Regards,
Shuai


> On 2023/6/6 15:49, Shuai Xue wrote:
>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>> Root Complex integrated End Point(RCiEP) device but only register counters
>> provided by each PCIe Root Port.
>>
>> To facilitate collection of statistics the controller provides the
>> following two features for each Root Port:
>>
>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>   low-power LTSSM state)
>> - Event counters (Error and Non-Error for lanes)
>>
>> Note, only one counter for each type and does not overflow interrupt.
>>
>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>> named based the BDF of Root Port. For example,
>>
>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>
>> the PMU device name for this Root Port is dwc_rootport_3018.
>>
>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>
>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>
>> average RX bandwidth can be calculated like this:
>>
>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>
>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>> Reported-by: kernel test robot <lkp@intel.com>
>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>> ---
>>  drivers/perf/Kconfig        |   7 +
>>  drivers/perf/Makefile       |   1 +
>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>  3 files changed, 714 insertions(+)
>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>
>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>> index 711f82400086..6ff3921d7a62 100644
>> --- a/drivers/perf/Kconfig
>> +++ b/drivers/perf/Kconfig
>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>  	  Enable perf support for Marvell DDR Performance monitoring
>>  	  event on CN10K platform.
>>  
>> +config DWC_PCIE_PMU
>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>> +	depends on (ARM64 && PCI)
>> +	help
>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>> +	  monitoring event on Yitian 710 platform.
>> +
>>  source "drivers/perf/arm_cspmu/Kconfig"
>>  
>>  source "drivers/perf/amlogic/Kconfig"
>> diff --git a/drivers/perf/Makefile b/drivers/perf/Makefile
>> index dabc859540ce..13a6d1b286da 100644
>> --- a/drivers/perf/Makefile
>> +++ b/drivers/perf/Makefile
>> @@ -22,5 +22,6 @@ obj-$(CONFIG_MARVELL_CN10K_TAD_PMU) += marvell_cn10k_tad_pmu.o
>>  obj-$(CONFIG_MARVELL_CN10K_DDR_PMU) += marvell_cn10k_ddr_pmu.o
>>  obj-$(CONFIG_APPLE_M1_CPU_PMU) += apple_m1_cpu_pmu.o
>>  obj-$(CONFIG_ALIBABA_UNCORE_DRW_PMU) += alibaba_uncore_drw_pmu.o
>> +obj-$(CONFIG_DWC_PCIE_PMU) += dwc_pcie_pmu.o
>>  obj-$(CONFIG_ARM_CORESIGHT_PMU_ARCH_SYSTEM_PMU) += arm_cspmu/
>>  obj-$(CONFIG_MESON_DDR_PMU) += amlogic/
>> diff --git a/drivers/perf/dwc_pcie_pmu.c b/drivers/perf/dwc_pcie_pmu.c
>> new file mode 100644
>> index 000000000000..8bfcf6e0662d
>> --- /dev/null
>> +++ b/drivers/perf/dwc_pcie_pmu.c
>> @@ -0,0 +1,706 @@
>> +// SPDX-License-Identifier: GPL-2.0
>> +/*
>> + * Synopsys DesignWare PCIe PMU driver
>> + *
>> + * Copyright (C) 2021-2023 Alibaba Inc.
>> + */
>> +
>> +#include <linux/bitfield.h>
>> +#include <linux/bitops.h>
>> +#include <linux/cpuhotplug.h>
>> +#include <linux/cpumask.h>
>> +#include <linux/device.h>
>> +#include <linux/errno.h>
>> +#include <linux/kernel.h>
>> +#include <linux/list.h>
>> +#include <linux/perf_event.h>
>> +#include <linux/pci.h>
>> +#include <linux/platform_device.h>
>> +#include <linux/smp.h>
>> +#include <linux/sysfs.h>
>> +#include <linux/types.h>
>> +
>> +#define DWC_PCIE_VSEC_RAS_DES_ID		0x02
>> +
>> +#define DWC_PCIE_EVENT_CNT_CTL			0x8
>> +
>> +/*
>> + * Event Counter Data Select includes two parts:
>> + * - 27-24: Group number(4-bit: 0..0x7)
>> + * - 23-16: Event number(8-bit: 0..0x13) within the Group
>> + *
>> + * Put them togother as TRM used.
>> + */
>> +#define DWC_PCIE_CNT_EVENT_SEL			GENMASK(27, 16)
>> +#define DWC_PCIE_CNT_LANE_SEL			GENMASK(11, 8)
>> +#define DWC_PCIE_CNT_STATUS			BIT(7)
>> +#define DWC_PCIE_CNT_ENABLE			GENMASK(4, 2)
>> +#define DWC_PCIE_PER_EVENT_OFF			0x1
>> +#define DWC_PCIE_PER_EVENT_ON			0x3
>> +#define DWC_PCIE_EVENT_CLEAR			GENMASK(1, 0)
>> +#define DWC_PCIE_EVENT_PER_CLEAR		0x1
>> +
>> +#define DWC_PCIE_EVENT_CNT_DATA			0xC
>> +
>> +#define DWC_PCIE_TIME_BASED_ANAL_CTL		0x10
>> +#define DWC_PCIE_TIME_BASED_REPORT_SEL		GENMASK(31, 24)
>> +#define DWC_PCIE_TIME_BASED_DURATION_SEL	GENMASK(15, 8)
>> +#define DWC_PCIE_DURATION_MANUAL_CTL		0x0
>> +#define DWC_PCIE_DURATION_1MS			0x1
>> +#define DWC_PCIE_DURATION_10MS			0x2
>> +#define DWC_PCIE_DURATION_100MS			0x3
>> +#define DWC_PCIE_DURATION_1S			0x4
>> +#define DWC_PCIE_DURATION_2S			0x5
>> +#define DWC_PCIE_DURATION_4S			0x6
>> +#define DWC_PCIE_DURATION_4US			0xFF
>> +#define DWC_PCIE_TIME_BASED_TIMER_START		BIT(0)
>> +#define DWC_PCIE_TIME_BASED_CNT_ENABLE		0x1
>> +
>> +#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW	0x14
>> +#define DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH	0x18
>> +
>> +/* Event attributes */
>> +#define DWC_PCIE_CONFIG_EVENTID			GENMASK(15, 0)
>> +#define DWC_PCIE_CONFIG_TYPE			GENMASK(19, 16)
>> +#define DWC_PCIE_CONFIG_LANE			GENMASK(27, 20)
>> +
>> +#define DWC_PCIE_EVENT_ID(event)	FIELD_GET(DWC_PCIE_CONFIG_EVENTID, (event)->attr.config)
>> +#define DWC_PCIE_EVENT_TYPE(event)	FIELD_GET(DWC_PCIE_CONFIG_TYPE, (event)->attr.config)
>> +#define DWC_PCIE_EVENT_LANE(event)	FIELD_GET(DWC_PCIE_CONFIG_LANE, (event)->attr.config)
>> +
>> +enum dwc_pcie_event_type {
>> +	DWC_PCIE_TYPE_INVALID,
>> +	DWC_PCIE_TIME_BASE_EVENT,
>> +	DWC_PCIE_LANE_EVENT,
>> +};
>> +
>> +#define DWC_PCIE_LANE_EVENT_MAX_PERIOD		GENMASK_ULL(31, 0)
>> +#define DWC_PCIE_TIME_BASED_EVENT_MAX_PERIOD	GENMASK_ULL(63, 0)
>> +
>> +
>> +struct dwc_pcie_pmu {
>> +	struct pci_dev		*pdev;		/* Root Port device */
> 
> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
> pointer. I didn't see you hold the root port to avoid the removal.
> 
>> +	u16			ras_des;	/* RAS DES capability offset */
>> +	u32			nr_lanes;
>> +
>> +	struct list_head	pmu_node;
>> +	struct hlist_node	cpuhp_node;
>> +	struct pmu		pmu;
>> +	struct perf_event	*event;
>> +	int			oncpu;
>> +};
>> +
>> +struct dwc_pcie_pmu_priv {
>> +	struct device *dev;
>> +	struct list_head pmu_nodes;
>> +};
>> +
>> +#define to_dwc_pcie_pmu(p) (container_of(p, struct dwc_pcie_pmu, pmu))
>> +
> 
> somebody told me to put @pmu as the first member then this macro will have no calculation. :)
> 
>> +static struct platform_device *dwc_pcie_pmu_dev;
>> +static int dwc_pcie_pmu_hp_state;
>> +
>> +static ssize_t cpumask_show(struct device *dev,
>> +					 struct device_attribute *attr,
>> +					 char *buf)
>> +{
>> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(dev_get_drvdata(dev));
>> +
>> +	return cpumap_print_to_pagebuf(true, buf, cpumask_of(pcie_pmu->oncpu));
>> +}
>> +static DEVICE_ATTR_RO(cpumask);
>> +
>> +static struct attribute *dwc_pcie_pmu_cpumask_attrs[] = {
>> +	&dev_attr_cpumask.attr,
>> +	NULL
>> +};
>> +
>> +static struct attribute_group dwc_pcie_cpumask_attr_group = {
>> +	.attrs = dwc_pcie_pmu_cpumask_attrs,
>> +};
>> +
>> +struct dwc_pcie_format_attr {
>> +	struct device_attribute attr;
>> +	u64 field;
>> +	int config;
>> +};
>> +
>> +static ssize_t dwc_pcie_pmu_format_show(struct device *dev,
>> +					struct device_attribute *attr,
>> +					char *buf)
>> +{
>> +	struct dwc_pcie_format_attr *fmt = container_of(attr, typeof(*fmt), attr);
>> +	int lo = __ffs(fmt->field), hi = __fls(fmt->field);
>> +
>> +	return sysfs_emit(buf, "config:%d-%d\n", lo, hi);
>> +}
>> +
>> +#define _dwc_pcie_format_attr(_name, _cfg, _fld)				\
>> +	(&((struct dwc_pcie_format_attr[]) {{					\
>> +		.attr = __ATTR(_name, 0444, dwc_pcie_pmu_format_show, NULL),	\
>> +		.config = _cfg,							\
>> +		.field = _fld,							\
>> +	}})[0].attr.attr)
>> +
>> +#define dwc_pcie_format_attr(_name, _fld)	_dwc_pcie_format_attr(_name, 0, _fld)
>> +
>> +static struct attribute *dwc_pcie_format_attrs[] = {
>> +	dwc_pcie_format_attr(type, DWC_PCIE_CONFIG_TYPE),
>> +	dwc_pcie_format_attr(eventid, DWC_PCIE_CONFIG_EVENTID),
>> +	dwc_pcie_format_attr(lane, DWC_PCIE_CONFIG_LANE),
>> +	NULL,
>> +};
>> +
>> +static struct attribute_group dwc_pcie_format_attrs_group = {
>> +	.name = "format",
>> +	.attrs = dwc_pcie_format_attrs,
>> +};
>> +
>> +struct dwc_pcie_event_attr {
>> +	struct device_attribute attr;
>> +	enum dwc_pcie_event_type type;
>> +	u16 eventid;
>> +	u8 lane;
>> +};
>> +
>> +static ssize_t dwc_pcie_event_show(struct device *dev,
>> +				struct device_attribute *attr, char *buf)
>> +{
>> +	struct dwc_pcie_event_attr *eattr;
>> +
>> +	eattr = container_of(attr, typeof(*eattr), attr);
>> +
>> +	if (eattr->type == DWC_PCIE_LANE_EVENT)
>> +		return sysfs_emit(buf, "eventid=0x%x,type=0x%x,lane=?\n",
>> +				  eattr->eventid, eattr->type);
>> +
>> +	return sysfs_emit(buf, "eventid=0x%x,type=0x%x\n", eattr->eventid,
>> +		       eattr->type);
>> +}
>> +
>> +#define DWC_PCIE_EVENT_ATTR(_name, _type, _eventid, _lane)		\
>> +	(&((struct dwc_pcie_event_attr[]) {{				\
>> +		.attr = __ATTR(_name, 0444, dwc_pcie_event_show, NULL),	\
>> +		.type = _type,						\
>> +		.eventid = _eventid,					\
>> +		.lane = _lane,						\
>> +	}})[0].attr.attr)
>> +
>> +#define DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(_name, _eventid)		\
>> +	DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_TIME_BASE_EVENT, _eventid, 0)
>> +#define DWC_PCIE_PMU_LANE_EVENT_ATTR(_name, _eventid)			\
>> +	DWC_PCIE_EVENT_ATTR(_name, DWC_PCIE_LANE_EVENT, _eventid, 0)
>> +
>> +static struct attribute *dwc_pcie_pmu_time_event_attrs[] = {
>> +	/* Group #0 */
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(one_cycle, 0x00),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_L0S, 0x01),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(RX_L0S, 0x02),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L0, 0x03),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1, 0x04),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_1, 0x05),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_2, 0x06),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(CFG_RCVRY, 0x07),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_RX_L0S, 0x08),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_AUX, 0x09),
>> +
>> +	/* Group #1 */
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_PCIe_TLP_Data_Payload, 0x20),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_PCIe_TLP_Data_Payload, 0x21),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_CCIX_TLP_Data_Payload, 0x22),
>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_CCIX_TLP_Data_Payload, 0x23),
>> +
>> +	/*
>> +	 * Leave it to the user to specify the lane ID to avoid generating
>> +	 * a list of hundreds of events.
>> +	 */
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ack_dllp, 0x600),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_update_fc_dllp, 0x601),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ack_dllp, 0x602),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_update_fc_dllp, 0x603),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_nulified_tlp, 0x604),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_nulified_tlp, 0x605),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_duplicate_tl, 0x606),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_write, 0x700),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_read, 0x701),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_write, 0x702),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_read, 0x703),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_write, 0x704),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_read, 0x705),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_without_data, 0x706),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_with_data, 0x707),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_message_tlp, 0x708),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_atomic, 0x709),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_tlp_with_prefix, 0x70A),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_write, 0x70B),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_read, 0x70C),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_write, 0x70F),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_read, 0x710),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_without_data, 0x711),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_with_data, 0x712),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_message_tlp, 0x713),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_atomic, 0x714),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_tlp_with_prefix, 0x715),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ccix_tlp, 0x716),
>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ccix_tlp, 0x717),
>> +
> 
> Intended blank line?
> 
>> +	NULL
>> +};
>> +
>> +static const struct attribute_group dwc_pcie_event_attrs_group = {
>> +	.name = "events",
>> +	.attrs = dwc_pcie_pmu_time_event_attrs,
>> +};
>> +
>> +static const struct attribute_group *dwc_pcie_attr_groups[] = {
>> +	&dwc_pcie_event_attrs_group,
>> +	&dwc_pcie_format_attrs_group,
>> +	&dwc_pcie_cpumask_attr_group,
>> +	NULL
>> +};
>> +
>> +static void dwc_pcie_pmu_lane_event_enable(struct dwc_pcie_pmu *pcie_pmu,
>> +					   bool enable)
>> +{
>> +	struct pci_dev *pdev = pcie_pmu->pdev;
>> +	u16 ras_des = pcie_pmu->ras_des;
>> +	u32 val;
>> +
>> +	pci_read_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL, &val);
>> +
>> +	/* Clear DWC_PCIE_CNT_ENABLE field first */
>> +	val &= ~DWC_PCIE_CNT_ENABLE;
>> +	if (enable)
>> +		val |= FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_ON);
>> +	else
>> +		val |= FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF);
>> +
>> +	pci_write_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL, val);
>> +}
>> +
>> +static void dwc_pcie_pmu_time_based_event_enable(struct dwc_pcie_pmu *pcie_pmu,
>> +					  bool enable)
>> +{
>> +	struct pci_dev *pdev = pcie_pmu->pdev;
>> +	u16 ras_des = pcie_pmu->ras_des;
>> +	u32 val;
>> +
>> +	pci_read_config_dword(pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL,
>> +			      &val);
>> +
>> +	if (enable)
>> +		val |= DWC_PCIE_TIME_BASED_CNT_ENABLE;
>> +	else
>> +		val &= ~DWC_PCIE_TIME_BASED_CNT_ENABLE;
>> +
>> +	pci_write_config_dword(pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL,
>> +			       val);
>> +}
>> +
>> +static u64 dwc_pcie_pmu_read_lane_event_counter(struct dwc_pcie_pmu *pcie_pmu)
>> +{
>> +	struct pci_dev *pdev = pcie_pmu->pdev;
>> +	u16 ras_des = pcie_pmu->ras_des;
>> +	u32 val;
>> +
>> +	pci_read_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_DATA, &val);
>> +
>> +	return val;
>> +}
>> +
>> +static u64 dwc_pcie_pmu_read_time_based_counter(struct dwc_pcie_pmu *pcie_pmu)
>> +{
>> +	struct pci_dev *pdev = pcie_pmu->pdev;
>> +	u16 ras_des = pcie_pmu->ras_des;
>> +	u64 count;
>> +	u32 val;
>> +
>> +	pci_read_config_dword(
>> +		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH, &val);
>> +	count = val;
>> +	count <<= 32;
>> +
>> +	pci_read_config_dword(
>> +		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW, &val);
>> +
>> +	count += val;
>> +
>> +	return count;
>> +}
>> +
>> +static void dwc_pcie_pmu_event_update(struct perf_event *event)
>> +{
>> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
>> +	u64 delta, prev, now;
>> +
>> +	do {
>> +		prev = local64_read(&hwc->prev_count);
>> +
>> +		if (type == DWC_PCIE_LANE_EVENT)
>> +			now = dwc_pcie_pmu_read_lane_event_counter(pcie_pmu);
>> +		else if (type == DWC_PCIE_TIME_BASE_EVENT)
>> +			now = dwc_pcie_pmu_read_time_based_counter(pcie_pmu);
>> +
>> +	} while (local64_cmpxchg(&hwc->prev_count, prev, now) != prev);
>> +
>> +	if (type == DWC_PCIE_LANE_EVENT)
>> +		delta = (now - prev) & DWC_PCIE_LANE_EVENT_MAX_PERIOD;
>> +	else if (type == DWC_PCIE_TIME_BASE_EVENT)
>> +		delta = (now - prev) & DWC_PCIE_TIME_BASED_EVENT_MAX_PERIOD;
>> +
>> +	local64_add(delta, &event->count);
>> +}
>> +
>> +static int dwc_pcie_pmu_event_init(struct perf_event *event)
>> +{
>> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
>> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
>> +	struct perf_event *sibling;
>> +	u32 lane;
>> +
>> +	if (event->attr.type != event->pmu->type)
>> +		return -ENOENT;
>> +
>> +	/* We don't support sampling */
>> +	if (is_sampling_event(event))
>> +		return -EINVAL;
>> +
>> +	/* We cannot support task bound events */
>> +	if (event->cpu < 0 || event->attach_state & PERF_ATTACH_TASK)
>> +		return -EINVAL;
>> +
>> +	if (event->group_leader != event &&
>> +	    !is_software_event(event->group_leader))
>> +		return -EINVAL;
>> +
>> +	for_each_sibling_event(sibling, event->group_leader) {
>> +		if (sibling->pmu != event->pmu && !is_software_event(sibling))
>> +			return -EINVAL;
>> +	}
>> +
>> +	if (type == DWC_PCIE_LANE_EVENT) {
>> +		lane = DWC_PCIE_EVENT_LANE(event);
>> +		if (lane < 0 || lane >= pcie_pmu->nr_lanes)
>> +			return -EINVAL;
>> +	}
>> +
>> +	event->cpu = pcie_pmu->oncpu;
>> +
>> +	return 0;
>> +}
>> +
>> +static void dwc_pcie_pmu_set_period(struct hw_perf_event *hwc)
>> +{
>> +	local64_set(&hwc->prev_count, 0);
>> +}
>> +
>> +static void dwc_pcie_pmu_event_start(struct perf_event *event, int flags)
>> +{
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
>> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
>> +
>> +	hwc->state = 0;
>> +	dwc_pcie_pmu_set_period(hwc);
>> +
>> +	if (type == DWC_PCIE_LANE_EVENT)
>> +		dwc_pcie_pmu_lane_event_enable(pcie_pmu, true);
>> +	else if (type == DWC_PCIE_TIME_BASE_EVENT)
>> +		dwc_pcie_pmu_time_based_event_enable(pcie_pmu, true);
>> +}
>> +
>> +static void dwc_pcie_pmu_event_stop(struct perf_event *event, int flags)
>> +{
>> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
>> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
>> +	struct hw_perf_event *hwc = &event->hw;
>> +
>> +	if (event->hw.state & PERF_HES_STOPPED)
>> +		return;
>> +
>> +	if (type == DWC_PCIE_LANE_EVENT)
>> +		dwc_pcie_pmu_lane_event_enable(pcie_pmu, false);
>> +	else if (type == DWC_PCIE_TIME_BASE_EVENT)
>> +		dwc_pcie_pmu_time_based_event_enable(pcie_pmu, false);
>> +
>> +	dwc_pcie_pmu_event_update(event);
>> +	hwc->state |= PERF_HES_STOPPED | PERF_HES_UPTODATE;
>> +}
>> +
>> +static int dwc_pcie_pmu_event_add(struct perf_event *event, int flags)
>> +{
>> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
>> +	struct pci_dev *pdev = pcie_pmu->pdev;
>> +	struct hw_perf_event *hwc = &event->hw;
>> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
>> +	int event_id = DWC_PCIE_EVENT_ID(event);
>> +	int lane = DWC_PCIE_EVENT_LANE(event);
>> +	u16 ras_des = pcie_pmu->ras_des;
>> +	u32 ctrl;
>> +
>> +	/* Only one counter and it is in use */
>> +	if (pcie_pmu->event)
>> +		return -ENOSPC;
>> +
>> +	pcie_pmu->event = event;
>> +	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
>> +
>> +	if (type == DWC_PCIE_LANE_EVENT) {
>> +		/* EVENT_COUNTER_DATA_REG needs clear manually */
>> +		ctrl = FIELD_PREP(DWC_PCIE_CNT_EVENT_SEL, event_id) |
>> +			FIELD_PREP(DWC_PCIE_CNT_LANE_SEL, lane) |
>> +			FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF) |
>> +			FIELD_PREP(DWC_PCIE_EVENT_CLEAR, DWC_PCIE_EVENT_PER_CLEAR);
>> +		pci_write_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL,
>> +				       ctrl);
>> +	} else if (type == DWC_PCIE_TIME_BASE_EVENT) {
>> +		/*
>> +		 * TIME_BASED_ANAL_DATA_REG is a 64 bit register, we can safely
>> +		 * use it with any manually controlled duration. And it is
>> +		 * cleared when next measurement starts.
>> +		 */
>> +		ctrl = FIELD_PREP(DWC_PCIE_TIME_BASED_REPORT_SEL, event_id) |
>> +			FIELD_PREP(DWC_PCIE_TIME_BASED_DURATION_SEL,
>> +				   DWC_PCIE_DURATION_MANUAL_CTL) |
>> +			DWC_PCIE_TIME_BASED_CNT_ENABLE;
>> +		pci_write_config_dword(
>> +			pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL, ctrl);
>> +	}
>> +
>> +	if (flags & PERF_EF_START)
>> +		dwc_pcie_pmu_event_start(event, PERF_EF_RELOAD);
>> +
>> +	perf_event_update_userpage(event);
>> +
>> +	return 0;
>> +}
>> +
>> +static void dwc_pcie_pmu_event_del(struct perf_event *event, int flags)
>> +{
>> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
>> +
>> +	dwc_pcie_pmu_event_stop(event, flags | PERF_EF_UPDATE);
>> +	perf_event_update_userpage(event);
>> +	pcie_pmu->event = NULL;
>> +}
>> +
>> +static int __dwc_pcie_pmu_probe(struct dwc_pcie_pmu_priv *priv)
>> +{
>> +	struct pci_dev *pdev = NULL;
>> +	struct dwc_pcie_pmu *pcie_pmu;
>> +	char *name;
>> +	u32 bdf;
>> +	int ret;
>> +
>> +	INIT_LIST_HEAD(&priv->pmu_nodes);
>> +
>> +	/* Match the rootport with VSEC_RAS_DES_ID, and register a PMU for it */
>> +	for_each_pci_dev(pdev) {
>> +		u16 vsec;
>> +		u32 val;
>> +
>> +		if (!(pci_is_pcie(pdev) &&
>> +		      pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT))
>> +			continue;
>> +
>> +		vsec = pci_find_vsec_capability(pdev, PCI_VENDOR_ID_ALIBABA,
>> +						DWC_PCIE_VSEC_RAS_DES_ID);
>> +		if (!vsec)
>> +			continue;
>> +
>> +		pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val);
>> +		if (PCI_VNDR_HEADER_REV(val) != 0x04 ||
>> +		    PCI_VNDR_HEADER_LEN(val) != 0x100)
>> +			continue;
>> +		pci_dbg(pdev,
>> +			"Detected PCIe Vendor-Specific Extended Capability RAS DES\n");
>> +
>> +		bdf = PCI_DEVID(pdev->bus->number, pdev->devfn);
>> +		name = devm_kasprintf(priv->dev, GFP_KERNEL, "dwc_rootport_%x",
>> +				      bdf);
>> +		if (!name)
>> +			return -ENOMEM;
>> +
>> +		/* All checks passed, go go go */
>> +		pcie_pmu = devm_kzalloc(&pdev->dev, sizeof(*pcie_pmu), GFP_KERNEL);
>> +		if (!pcie_pmu) {
>> +			pci_dev_put(pdev);
> 
> we need to call pci_dev_put on all the return branch below and above and after the for_each_pci_dev()
> loop to keep the refcnt balance.
> 
>> +			return -ENOMEM;
>> +		}
>> +
>> +		pcie_pmu->pdev = pdev;
>> +		pcie_pmu->ras_des = vsec;
>> +		pcie_pmu->nr_lanes = pcie_get_width_cap(pdev);
>> +		pcie_pmu->pmu = (struct pmu){
>> +			.module		= THIS_MODULE,
>> +			.attr_groups	= dwc_pcie_attr_groups,
>> +			.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
>> +			.task_ctx_nr	= perf_invalid_context,
>> +			.event_init	= dwc_pcie_pmu_event_init,
>> +			.add		= dwc_pcie_pmu_event_add,
>> +			.del		= dwc_pcie_pmu_event_del,
>> +			.start		= dwc_pcie_pmu_event_start,
>> +			.stop		= dwc_pcie_pmu_event_stop,
>> +			.read		= dwc_pcie_pmu_event_update,
>> +		};
>> +
>> +		/* Add this instance to the list used by the offline callback */
>> +		ret = cpuhp_state_add_instance(dwc_pcie_pmu_hp_state,
>> +					       &pcie_pmu->cpuhp_node);
>> +		if (ret) {
>> +			pci_err(pcie_pmu->pdev,
>> +				"Error %d registering hotplug @%x\n", ret, bdf);
>> +			return ret;
>> +		}
>> +		ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
>> +		if (ret) {
>> +			pci_err(pcie_pmu->pdev,
>> +				"Error %d registering PMU @%x\n", ret, bdf);
>> +			cpuhp_state_remove_instance_nocalls(
>> +				dwc_pcie_pmu_hp_state, &pcie_pmu->cpuhp_node);
>> +			return ret;
>> +		}
>> +
>> +		/* Add registered PMUs and unregister them when this driver remove */
>> +		list_add(&pcie_pmu->pmu_node, &priv->pmu_nodes);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int dwc_pcie_pmu_remove(struct platform_device *pdev)
>> +{
>> +	struct dwc_pcie_pmu_priv *priv = platform_get_drvdata(pdev);
>> +	struct dwc_pcie_pmu *pcie_pmu;
>> +
>> +	list_for_each_entry(pcie_pmu, &priv->pmu_nodes, pmu_node) {
>> +		cpuhp_state_remove_instance(dwc_pcie_pmu_hp_state,
>> +					    &pcie_pmu->cpuhp_node);
>> +		perf_pmu_unregister(&pcie_pmu->pmu);
> 
> should unregister the PMU first, keep the order reverse to __dwc_pcie_pmu_probe().
> 
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static int dwc_pcie_pmu_probe(struct platform_device *pdev)
>> +{
>> +	struct dwc_pcie_pmu_priv *priv;
>> +	int ret;
>> +
>> +	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
>> +	if (!priv)
>> +		return -ENOMEM;
>> +
>> +	priv->dev = &pdev->dev;
>> +	platform_set_drvdata(pdev, priv);
>> +
>> +	/* If one PMU registration fails, remove all. */
>> +	ret = __dwc_pcie_pmu_probe(priv);
>> +	if (ret) {
>> +		dwc_pcie_pmu_remove(pdev);
>> +		return ret;
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void dwc_pcie_pmu_migrate(struct dwc_pcie_pmu *pcie_pmu, unsigned int cpu)
>> +{
>> +	/* This PMU does NOT support interrupt, just migrate context. */
>> +	perf_pmu_migrate_context(&pcie_pmu->pmu, pcie_pmu->oncpu, cpu);
>> +	pcie_pmu->oncpu = cpu;
>> +}
>> +
>> +static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
>> +{
>> +	struct dwc_pcie_pmu *pcie_pmu;
>> +	struct pci_dev *pdev;
>> +	int node;
>> +
>> +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
>> +	pdev = pcie_pmu->pdev;
>> +	node = dev_to_node(&pdev->dev);
>> +
>> +	if (node != NUMA_NO_NODE && cpu_to_node(pcie_pmu->oncpu) != node &&
>> +	    cpu_to_node(cpu) == node)
>> +		dwc_pcie_pmu_migrate(pcie_pmu, cpu);
>> +
>> +	return 0;
>> +}
>> +
>> +static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
>> +{
>> +	struct dwc_pcie_pmu *pcie_pmu;
>> +	struct pci_dev *pdev;
>> +	int node;
>> +	cpumask_t mask;
>> +	unsigned int target;
>> +
>> +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
>> +	if (cpu != pcie_pmu->oncpu)
>> +		return 0;
>> +
>> +	pdev = pcie_pmu->pdev;
>> +	node = dev_to_node(&pdev->dev);
>> +	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
>> +	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
>> +		target = cpumask_any(&mask);
> 
> The cpumask_of_node() only contains the online CPUs so this branch is redundant. For arm64
> using arch_numa.c the node cpumask is updated in numa_{add, remove}_cpu() and for other
> arthitecture the behaviour should keep consistenct. Please correct my if I'm wrong.
> 
>> +	else
>> +		target = cpumask_any_but(cpu_online_mask, cpu);
>> +	if (target < nr_cpu_ids)
>> +		dwc_pcie_pmu_migrate(pcie_pmu, target);
>> +
>> +	return 0;
>> +}
>> +
>> +static struct platform_driver dwc_pcie_pmu_driver = {
>> +	.probe = dwc_pcie_pmu_probe,
>> +	.remove = dwc_pcie_pmu_remove,
>> +	.driver = {.name = "dwc_pcie_pmu",},
>> +};
>> +
>> +static int __init dwc_pcie_pmu_init(void)
>> +{
>> +	int ret;
>> +
>> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
>> +				      "perf/dwc_pcie_pmu:online",
>> +				      dwc_pcie_pmu_online_cpu,
>> +				      dwc_pcie_pmu_offline_cpu);
>> +	if (ret < 0)
>> +		return ret;
>> +
>> +	dwc_pcie_pmu_hp_state = ret;
>> +
>> +	ret = platform_driver_register(&dwc_pcie_pmu_driver);
>> +	if (ret) {
>> +		cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
>> +		return ret;
>> +	}
>> +
>> +	dwc_pcie_pmu_dev = platform_device_register_simple(
>> +				"dwc_pcie_pmu", PLATFORM_DEVID_NONE, NULL, 0);
>> +	if (IS_ERR(dwc_pcie_pmu_dev)) {
>> +		platform_driver_unregister(&dwc_pcie_pmu_driver);
> 
> On failure we also need to remove cpuhp state as well.
> 
> Thanks,
> Yicong
> 
>> +		return PTR_ERR(dwc_pcie_pmu_dev);
>> +	}
>> +
>> +	return 0;
>> +}
>> +
>> +static void __exit dwc_pcie_pmu_exit(void)
>> +{
>> +	platform_device_unregister(dwc_pcie_pmu_dev);
>> +	platform_driver_unregister(&dwc_pcie_pmu_driver);
>> +	cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
>> +}
>> +
>> +module_init(dwc_pcie_pmu_init);
>> +module_exit(dwc_pcie_pmu_exit);
>> +
>> +MODULE_DESCRIPTION("PMU driver for DesignWare Cores PCI Express Controller");
>> +MODULE_AUTHOR("Shuai xue <xueshuai@linux.alibaba.com>");
>> +MODULE_AUTHOR("Wen Cheng <yinxuan_cw@linux.alibaba.com>");
>> +MODULE_LICENSE("GPL v2");
>>

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver
  2023-07-27 12:52     ` Shuai Xue
@ 2023-07-28 10:18       ` Jonathan Cameron
  0 siblings, 0 replies; 31+ messages in thread
From: Jonathan Cameron @ 2023-07-28 10:18 UTC (permalink / raw)
  To: Shuai Xue
  Cc: chengyou, kaishen, helgaas, yangyicong, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song


> >   
> >> +debugging, AER error injection, and collection of statistics. To facilitate
> >> +collection of statistics, Synopsys DesignWare Cores PCIe controller  
> > 
> > "Core's"
> > 
> > (as it belongs to the core rather than intent being that it applies to plural
> > cores?)  
> 
> "Synopsys DesignWare Cores PCIe controller" is from the title from Synopsys
> databook, so I prefer to keep as it is here.

Makes sense to keep them matching.

Jonathan



^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-07-27  9:39     ` Jonathan Cameron
@ 2023-07-28 12:41       ` Shuai Xue
  2023-07-28 15:20         ` Jonathan Cameron
  2023-08-01 11:46         ` Yicong Yang
  0 siblings, 2 replies; 31+ messages in thread
From: Shuai Xue @ 2023-07-28 12:41 UTC (permalink / raw)
  To: Jonathan Cameron, Yicong Yang
  Cc: chengyou, kaishen, helgaas, will, baolin.wang, robin.murphy,
	yangyicong, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song



On 2023/7/27 17:39, Jonathan Cameron wrote:
> On Tue, 6 Jun 2023 23:14:07 +0800
> Yicong Yang <yangyicong@huawei.com> wrote:
> 
>> On 2023/6/6 15:49, Shuai Xue wrote:
>>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>>> Root Complex integrated End Point(RCiEP) device but only register counters
>>> provided by each PCIe Root Port.
>>>
>>> To facilitate collection of statistics the controller provides the
>>> following two features for each Root Port:
>>>
>>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>>   low-power LTSSM state)
>>> - Event counters (Error and Non-Error for lanes)
>>>
>>> Note, only one counter for each type and does not overflow interrupt.
>>>
>>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>>> named based the BDF of Root Port. For example,
>>>
>>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>>
>>> the PMU device name for this Root Port is dwc_rootport_3018.
>>>
>>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>>
>>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>>
>>> average RX bandwidth can be calculated like this:
>>>
>>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>>
>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>> Reported-by: kernel test robot <lkp@intel.com>
>>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
> 
> I'll review on top to avoid any duplication with Yicong.

Thank you! It also served as a reminder that I missed Yicong's email. It appears
that Thunderbird mistakenly moved his email to the junk folder, resulting in me
overlooking it.

> 
> Note I've cropped the stuff neither of us commented on so it's
> easier to spot the feedback.

Thank you for noting that. My feedback is replied inline.

> 
> Jonathan
> 
>>> ---
>>>  drivers/perf/Kconfig        |   7 +
>>>  drivers/perf/Makefile       |   1 +
>>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>>  3 files changed, 714 insertions(+)
>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>
>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>> index 711f82400086..6ff3921d7a62 100644
>>> --- a/drivers/perf/Kconfig
>>> +++ b/drivers/perf/Kconfig
>>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>>  	  Enable perf support for Marvell DDR Performance monitoring
>>>  	  event on CN10K platform.
>>>  
>>> +config DWC_PCIE_PMU
>>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>>> +	depends on (ARM64 && PCI)
>>> +	help
>>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>>> +	  monitoring event on Yitian 710 platform.
> 
> The documentation kind of implies this isn't platform specific.
> If some parts are (such as which events exist) then you may want to push
> that to userspace / perftool with appropriate matching against specific SoC.
> 
> If it is generic, then change this text to "event on platform including the Yitian 710."

It is generic without any platform specific, so I will change it as you expected.

> 
>>> +
>>>  source "drivers/perf/arm_cspmu/Kconfig"
>>>  
>>>  source "drivers/perf/amlogic/Kconfig"
> 
>>> new file mode 100644
>>> index 000000000000..8bfcf6e0662d
>>> --- /dev/null
>>> +++ b/drivers/perf/dwc_pcie_pmu.c
>>> @@ -0,0 +1,706 @@
> 
> ...
> 
>>> +
>>> +struct dwc_pcie_pmu {
>>> +	struct pci_dev		*pdev;		/* Root Port device */  
>>
>> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
>> pointer. I didn't see you hold the root port to avoid the removal.

Do you mean that I should have a reference count of rootport by pci_dev_get() when allocating
pcie_pmu?

     pcie_pmu->pdev = pci_dev_get();

>>
>>> +	u16			ras_des;	/* RAS DES capability offset */
>>> +	u32			nr_lanes;
>>> +
>>> +	struct list_head	pmu_node;
>>> +	struct hlist_node	cpuhp_node;
>>> +	struct pmu		pmu;
>>> +	struct perf_event	*event;
>>> +	int			oncpu;
>>> +};
>>> +
>>> +struct dwc_pcie_pmu_priv {
>>> +	struct device *dev;
>>> +	struct list_head pmu_nodes;
>>> +};
>>> +
>>> +#define to_dwc_pcie_pmu(p) (container_of(p, struct dwc_pcie_pmu, pmu))
>>> +  
>>
>> somebody told me to put @pmu as the first member then this macro will have no calculation. :)
>>

Aha, you are right, I will move it as a first member.

> ...
> 
>>> +static ssize_t dwc_pcie_event_show(struct device *dev,
>>> +				struct device_attribute *attr, char *buf)
>>> +{
>>> +	struct dwc_pcie_event_attr *eattr;
>>> +
>>> +	eattr = container_of(attr, typeof(*eattr), attr);
>>> +
>>> +	if (eattr->type == DWC_PCIE_LANE_EVENT)
>>> +		return sysfs_emit(buf, "eventid=0x%x,type=0x%x,lane=?\n",
>>> +				  eattr->eventid, eattr->type);
>>> +
> 
> Elsewhere you always check for DWC_PCIE_TIME_BASE_EVENT.
> Should probably do so here as well for consistency.

Yes, I will also add the check here.

> 
>>> +	return sysfs_emit(buf, "eventid=0x%x,type=0x%x\n", eattr->eventid,
>>> +		       eattr->type);
>>> +}
> 
>>> +static struct attribute *dwc_pcie_pmu_time_event_attrs[] = {
>>> +	/* Group #0 */
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(one_cycle, 0x00),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_L0S, 0x01),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(RX_L0S, 0x02),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L0, 0x03),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1, 0x04),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_1, 0x05),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_2, 0x06),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(CFG_RCVRY, 0x07),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(TX_RX_L0S, 0x08),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(L1_AUX, 0x09),
>>> +
>>> +	/* Group #1 */
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_PCIe_TLP_Data_Payload, 0x20),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_PCIe_TLP_Data_Payload, 0x21),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Tx_CCIX_TLP_Data_Payload, 0x22),
>>> +	DWC_PCIE_PMU_TIME_BASE_EVENT_ATTR(Rx_CCIX_TLP_Data_Payload, 0x23),
>>> +
>>> +	/*
>>> +	 * Leave it to the user to specify the lane ID to avoid generating
>>> +	 * a list of hundreds of events.
>>> +	 */
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ack_dllp, 0x600),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_update_fc_dllp, 0x601),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ack_dllp, 0x602),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_update_fc_dllp, 0x603),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_nulified_tlp, 0x604),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_nulified_tlp, 0x605),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_duplicate_tl, 0x606),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_write, 0x700),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_memory_read, 0x701),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_write, 0x702),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_configuration_read, 0x703),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_write, 0x704),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_io_read, 0x705),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_without_data, 0x706),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_completion_with_data, 0x707),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_message_tlp, 0x708),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_atomic, 0x709),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_tlp_with_prefix, 0x70A),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_write, 0x70B),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_memory_read, 0x70C),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_write, 0x70F),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_io_read, 0x710),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_without_data, 0x711),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_completion_with_data, 0x712),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_message_tlp, 0x713),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_atomic, 0x714),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_tlp_with_prefix, 0x715),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(tx_ccix_tlp, 0x716),
>>> +	DWC_PCIE_PMU_LANE_EVENT_ATTR(rx_ccix_tlp, 0x717),
>>> +  
>>
>> Intended blank line?

Nope, will delete it.

>>
>>> +	NULL
>>> +};
> 
> 
> ...
> 
>>> +static u64 dwc_pcie_pmu_read_time_based_counter(struct dwc_pcie_pmu *pcie_pmu)
>>> +{
>>> +	struct pci_dev *pdev = pcie_pmu->pdev;
>>> +	u16 ras_des = pcie_pmu->ras_des;
>>> +	u64 count;
>>> +	u32 val;
>>> +
>>> +	pci_read_config_dword(
>>> +		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_HIGH, &val);
>>> +	count = val;
>>> +	count <<= 32;
>>> +
>>> +	pci_read_config_dword(
>>> +		pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_DATA_REG_LOW, &val);
> 
> This looks like tearing can occur.  you probably need to protect against that
> (usual trick is re read the _HIGH part and if it changed, try again)
> 
> The hardware might prevent tearing (it would freeze the low register when you
> read the high one, then only let it change after a read of the low registers is
> done).  If that's the case - add a comment to say so.

Good catch, I will check with hardware designer and reply later.

> 
>>> +
>>> +	count += val;
>>> +
>>> +	return count;
>>> +}
>>> +
> 
> 
> ...
>>> +static int dwc_pcie_pmu_event_add(struct perf_event *event, int flags)
>>> +{
>>> +	struct dwc_pcie_pmu *pcie_pmu = to_dwc_pcie_pmu(event->pmu);
>>> +	struct pci_dev *pdev = pcie_pmu->pdev;
>>> +	struct hw_perf_event *hwc = &event->hw;
>>> +	enum dwc_pcie_event_type type = DWC_PCIE_EVENT_TYPE(event);
>>> +	int event_id = DWC_PCIE_EVENT_ID(event);
>>> +	int lane = DWC_PCIE_EVENT_LANE(event);
>>> +	u16 ras_des = pcie_pmu->ras_des;
>>> +	u32 ctrl;
>>> +
>>> +	/* Only one counter and it is in use */
> 
> Yikes. That's quite a restriction.  Probably good to mention in the docs.
> I'm a little confused about the architecture though - there seem to be separate
> registers for the Lane and time based events.  Can't count those at same time?
> 

I am not quite sure, I will double check it and reply later.

>>> +	if (pcie_pmu->event)
>>> +		return -ENOSPC;
>>> +
>>> +	pcie_pmu->event = event;
>>> +	hwc->state = PERF_HES_STOPPED | PERF_HES_UPTODATE;
>>> +
>>> +	if (type == DWC_PCIE_LANE_EVENT) {
>>> +		/* EVENT_COUNTER_DATA_REG needs clear manually */
>>> +		ctrl = FIELD_PREP(DWC_PCIE_CNT_EVENT_SEL, event_id) |
>>> +			FIELD_PREP(DWC_PCIE_CNT_LANE_SEL, lane) |
>>> +			FIELD_PREP(DWC_PCIE_CNT_ENABLE, DWC_PCIE_PER_EVENT_OFF) |
>>> +			FIELD_PREP(DWC_PCIE_EVENT_CLEAR, DWC_PCIE_EVENT_PER_CLEAR);
>>> +		pci_write_config_dword(pdev, ras_des + DWC_PCIE_EVENT_CNT_CTL,
>>> +				       ctrl);
>>> +	} else if (type == DWC_PCIE_TIME_BASE_EVENT) {
>>> +		/*
>>> +		 * TIME_BASED_ANAL_DATA_REG is a 64 bit register, we can safely
>>> +		 * use it with any manually controlled duration. And it is
>>> +		 * cleared when next measurement starts.
>>> +		 */
>>> +		ctrl = FIELD_PREP(DWC_PCIE_TIME_BASED_REPORT_SEL, event_id) |
>>> +			FIELD_PREP(DWC_PCIE_TIME_BASED_DURATION_SEL,
>>> +				   DWC_PCIE_DURATION_MANUAL_CTL) |
>>> +			DWC_PCIE_TIME_BASED_CNT_ENABLE;
>>> +		pci_write_config_dword(
>>> +			pdev, ras_des + DWC_PCIE_TIME_BASED_ANAL_CTL, ctrl);
>>> +	}
>>> +
>>> +	if (flags & PERF_EF_START)
>>> +		dwc_pcie_pmu_event_start(event, PERF_EF_RELOAD);
>>> +
>>> +	perf_event_update_userpage(event);
>>> +
>>> +	return 0;
>>> +}
> ...
> 
>>> +static int __dwc_pcie_pmu_probe(struct dwc_pcie_pmu_priv *priv)
>>> +{
>>> +	struct pci_dev *pdev = NULL;
>>> +	struct dwc_pcie_pmu *pcie_pmu;
>>> +	char *name;
>>> +	u32 bdf;
>>> +	int ret;
>>> +
>>> +	INIT_LIST_HEAD(&priv->pmu_nodes);
>>> +
>>> +	/* Match the rootport with VSEC_RAS_DES_ID, and register a PMU for it */
>>> +	for_each_pci_dev(pdev) {
>>> +		u16 vsec;
>>> +		u32 val;
>>> +
>>> +		if (!(pci_is_pcie(pdev) &&
>>> +		      pci_pcie_type(pdev) == PCI_EXP_TYPE_ROOT_PORT))
>>> +			continue;
>>> +
>>> +		vsec = pci_find_vsec_capability(pdev, PCI_VENDOR_ID_ALIBABA,
>>> +						DWC_PCIE_VSEC_RAS_DES_ID);
>>> +		if (!vsec)
>>> +			continue;
>>> +
>>> +		pci_read_config_dword(pdev, vsec + PCI_VNDR_HEADER, &val);
>>> +		if (PCI_VNDR_HEADER_REV(val) != 0x04 ||
>>> +		    PCI_VNDR_HEADER_LEN(val) != 0x100)
>>> +			continue;
>>> +		pci_dbg(pdev,
>>> +			"Detected PCIe Vendor-Specific Extended Capability RAS DES\n");
>>> +
>>> +		bdf = PCI_DEVID(pdev->bus->number, pdev->devfn);
>>> +		name = devm_kasprintf(priv->dev, GFP_KERNEL, "dwc_rootport_%x",
>>> +				      bdf);
>>> +		if (!name)
>>> +			return -ENOMEM;
>>> +
>>> +		/* All checks passed, go go go */
>>> +		pcie_pmu = devm_kzalloc(&pdev->dev, sizeof(*pcie_pmu), GFP_KERNEL);
>>> +		if (!pcie_pmu) {
>>> +			pci_dev_put(pdev);  
>>
>> we need to call pci_dev_put on all the return branch below and above and after the for_each_pci_dev()
>> loop to keep the refcnt balance.
> 
> Good spot. I'd use a goto for this given there are lots of places.

Forgive me, it has been catched by other reviewers, I missed other return branches,
will fix it with goto.

> 
>>
>>> +			return -ENOMEM;
>>> +		}
>>> +
>>> +		pcie_pmu->pdev = pdev;
>>> +		pcie_pmu->ras_des = vsec;
>>> +		pcie_pmu->nr_lanes = pcie_get_width_cap(pdev);
>>> +		pcie_pmu->pmu = (struct pmu){
>>> +			.module		= THIS_MODULE,
>>> +			.attr_groups	= dwc_pcie_attr_groups,
>>> +			.capabilities	= PERF_PMU_CAP_NO_EXCLUDE,
>>> +			.task_ctx_nr	= perf_invalid_context,
>>> +			.event_init	= dwc_pcie_pmu_event_init,
>>> +			.add		= dwc_pcie_pmu_event_add,
>>> +			.del		= dwc_pcie_pmu_event_del,
>>> +			.start		= dwc_pcie_pmu_event_start,
>>> +			.stop		= dwc_pcie_pmu_event_stop,
>>> +			.read		= dwc_pcie_pmu_event_update,
>>> +		};
>>> +
>>> +		/* Add this instance to the list used by the offline callback */
>>> +		ret = cpuhp_state_add_instance(dwc_pcie_pmu_hp_state,
>>> +					       &pcie_pmu->cpuhp_node);
>>> +		if (ret) {
>>> +			pci_err(pcie_pmu->pdev,
>>> +				"Error %d registering hotplug @%x\n", ret, bdf);
>>> +			return ret;
>>> +		}
> 
> Here you mix non devm_ handling in mid way through a series of devm_ calls.
> Whilst I 'think' what you have here is fine, I prefer to minimize thinking
> whilst reviewing and using devm_add_action_or_reset() with callbacks
> in appropriate places would ensure automatic unwinding in the error
> path deals with everything in the reverse order of setup.
> 
> You just need two instances - one to unwind the cpuhp_state_add_instance() and
> one to unwind the perf_pmu_register()

Cool, devm_add_action_or_reset saves my life. I will use it.

>  
>>> +		ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
>>> +		if (ret) {
>>> +			pci_err(pcie_pmu->pdev,
>>> +				"Error %d registering PMU @%x\n", ret, bdf);
>>> +			cpuhp_state_remove_instance_nocalls(
>>> +				dwc_pcie_pmu_hp_state, &pcie_pmu->cpuhp_node);
>>> +			return ret;
>>> +		}
>>> +
>>> +		/* Add registered PMUs and unregister them when this driver remove */
>>> +		list_add(&pcie_pmu->pmu_node, &priv->pmu_nodes);
> 
> This handling would be replaced by the tracking devm is doing for us. So I think
> there will be no need for the list.

You are right, will remove it.

> 
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int dwc_pcie_pmu_remove(struct platform_device *pdev)
>>> +{
>>> +	struct dwc_pcie_pmu_priv *priv = platform_get_drvdata(pdev);
>>> +	struct dwc_pcie_pmu *pcie_pmu;
>>> +
>>> +	list_for_each_entry(pcie_pmu, &priv->pmu_nodes, pmu_node) {
>>> +		cpuhp_state_remove_instance(dwc_pcie_pmu_hp_state,
>>> +					    &pcie_pmu->cpuhp_node);
>>> +		perf_pmu_unregister(&pcie_pmu->pmu);  
>>
>> should unregister the PMU first, keep the order reverse to __dwc_pcie_pmu_probe().
> These two could have been handled via appropriate devm_add_action_or_reset()
> above and let that infrastructure unwind for us in the error path.
> 
> If anyone fixes the whole pmu drivers aren't removable mess, then we will
> also end up with remove handling for free :)

As replied above, will use devm_add_action_or_reset.
> 
>>
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int dwc_pcie_pmu_probe(struct platform_device *pdev)
>>> +{
>>> +	struct dwc_pcie_pmu_priv *priv;
>>> +	int ret;
>>> +
>>> +	priv = devm_kzalloc(&pdev->dev, sizeof(*priv), GFP_KERNEL);
>>> +	if (!priv)
>>> +		return -ENOMEM;
>>> +
>>> +	priv->dev = &pdev->dev;
>>> +	platform_set_drvdata(pdev, priv);
>>> +
>>> +	/* If one PMU registration fails, remove all. */
>>> +	ret = __dwc_pcie_pmu_probe(priv);
>>> +	if (ret) {
>>> +		dwc_pcie_pmu_remove(pdev);
> 
> There is a bit of mixing of devm and not here which makes things somewhat
> hard to reason about.  Perhaps take the whole unwind flow over to devm managed.
> See above.
> 

Got it, will do that.

>>> +		return ret;
>>> +	}
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static void dwc_pcie_pmu_migrate(struct dwc_pcie_pmu *pcie_pmu, unsigned int cpu)
>>> +{
>>> +	/* This PMU does NOT support interrupt, just migrate context. */
>>> +	perf_pmu_migrate_context(&pcie_pmu->pmu, pcie_pmu->oncpu, cpu);
>>> +	pcie_pmu->oncpu = cpu;
>>> +}
>>> +
>>> +static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
>>> +{
>>> +	struct dwc_pcie_pmu *pcie_pmu;
>>> +	struct pci_dev *pdev;
>>> +	int node;
>>> +
>>> +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
>>> +	pdev = pcie_pmu->pdev;
>>> +	node = dev_to_node(&pdev->dev);
>>> +
>>> +	if (node != NUMA_NO_NODE && cpu_to_node(pcie_pmu->oncpu) != node &&
> 
> Perhaps worth a comment on when you might see node == NUMA_NO_NODE?
> Beyond NUMA being entirely disabled, I'd hope that never happens and for that you
> might be able to use a compile time check.
> 
> I wonder if this can be simplified by a flag that says if we are already in the
> right node? Might be easier to follow than having similar dance in online and offline
> to figure that out.

Ok, I will add a comment for NUMA_NO_NODE. If no numa support, I think
any CPU is fine to bind.

pcie_pmu->on_cpu may be a good choise to be used as a flag, right? pcie_pmu->on_cpu
will be set as -1 when pcie_pmu is allocated and then check in
dwc_pcie_pmu_online_cpu() first.

Then, the code will be:

static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
{
	struct dwc_pcie_pmu *pcie_pmu;
	struct pci_dev *pdev;
	int node;

	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
	/* If another CPU is already managing this PMU, simply return. */
	if (pcie_pmu->on_cpu != -1)
		return 0;

	pdev = pcie_pmu->pdev;
	node = dev_to_node(&pdev->dev);

	/* Select the first CPU if no numa support. */
	if (node == NUMA_NO_NODE)
		pcie_pmu->on_cpu = cpu;
	else if (cpu_to_node(pcie_pmu->on_cpu) != node &&
		 cpu_to_node(cpu) == node)
		dwc_pcie_pmu_migrate(pcie_pmu, cpu);

	return 0;
}
> 
> 
>>> +	    cpu_to_node(cpu) == node)
>>> +		dwc_pcie_pmu_migrate(pcie_pmu, cpu);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static int dwc_pcie_pmu_offline_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
>>> +{
>>> +	struct dwc_pcie_pmu *pcie_pmu;
>>> +	struct pci_dev *pdev;
>>> +	int node;
>>> +	cpumask_t mask;
>>> +	unsigned int target;
>>> +
>>> +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
>>> +	if (cpu != pcie_pmu->oncpu)
>>> +		return 0;
>>> +
>>> +	pdev = pcie_pmu->pdev;
>>> +	node = dev_to_node(&pdev->dev);
>>> +	if (cpumask_and(&mask, cpumask_of_node(node), cpu_online_mask) &&
>>> +	    cpumask_andnot(&mask, &mask, cpumask_of(cpu)))
>>> +		target = cpumask_any(&mask);  
>>
>> The cpumask_of_node() only contains the online CPUs so this branch is redundant. For arm64
>> using arch_numa.c the node cpumask is updated in numa_{add, remove}_cpu() and for other
>> arthitecture the behaviour should keep consistenct. Please correct my if I'm wrong.

I am afraid that the behaviour is not consistenct among all arthitecture and cpumask_of_node()
may contains the offline CPUs.

cpu_online_mask  - has bit 'cpu' set iff cpu available to scheduler and it is updated by:

	set_cpu_online(cpu, true);
	set_cpu_online(cpu, false);


cpumask_of_node() is a interface for `node_to_cpumask_map` which is updated by

	numa_{add, remove}_cpu()


On arm64, when a CPU receives a IPI_CPU_STOP interrupt, local_cpu_stop will set current CPU offline,
but it will not be remove from cpumask_of_node.


For ARM64 and RISC-V arthitecture, numa_remove_cpu() and set_cpu_online(cpu, false)
are both executed in __cpu_disable() when a CPU is brought down. But for arm32, only
set_cpu_online(cpu, false) is called in __cpu_disable().


>>
>>> +	else
>>> +		target = cpumask_any_but(cpu_online_mask, cpu);
> 
> If following above suggestion, would set flag to say in wrong node here - and wherever
> you end up in a node to start with...

Based above, I will ignore this comment.

> 
> 
>>> +	if (target < nr_cpu_ids)
>>> +		dwc_pcie_pmu_migrate(pcie_pmu, target);
>>> +
>>> +	return 0;
>>> +}
>>> +
>>> +static struct platform_driver dwc_pcie_pmu_driver = {
>>> +	.probe = dwc_pcie_pmu_probe,
>>> +	.remove = dwc_pcie_pmu_remove,
>>> +	.driver = {.name = "dwc_pcie_pmu",},
>>> +};
>>> +
>>> +static int __init dwc_pcie_pmu_init(void)
>>> +{
>>> +	int ret;
>>> +
>>> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
>>> +				      "perf/dwc_pcie_pmu:online",
>>> +				      dwc_pcie_pmu_online_cpu,
>>> +				      dwc_pcie_pmu_offline_cpu);
>>> +	if (ret < 0)
>>> +		return ret;
>>> +
>>> +	dwc_pcie_pmu_hp_state = ret;
>>> +
>>> +	ret = platform_driver_register(&dwc_pcie_pmu_driver);
>>> +	if (ret) {
>>> +		cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
>>> +		return ret;
>>> +	}
>>> +
>>> +	dwc_pcie_pmu_dev = platform_device_register_simple(
>>> +				"dwc_pcie_pmu", PLATFORM_DEVID_NONE, NULL, 0);
>>> +	if (IS_ERR(dwc_pcie_pmu_dev)) {
>>> +		platform_driver_unregister(&dwc_pcie_pmu_driver);  
>>
>> On failure we also need to remove cpuhp state as well.
> 
> I'd suggest using gotos and a single error handling block. That
> makes it both harder to forget things like this and easier to
> compare that block with what happens in exit() - so slightly 
> easier to review!

Given that we have a appropriate way to tear down the PMUs via devm_add_action_or_reset(),
I am going to remove the redundant probe/remove framework via platform_driver_{un}register().
for_each probe process in __dwc_pcie_pmu_probe() will be move into dwc_pcie_pmu_init().
Is it a better way?

Thank you very much for your valuable comments.

Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-07-27  3:45           ` Shuai Xue
@ 2023-07-28 13:39             ` Will Deacon
  2023-07-31  7:30               ` Shuai Xue
  0 siblings, 1 reply; 31+ messages in thread
From: Will Deacon @ 2023-07-28 13:39 UTC (permalink / raw)
  To: Shuai Xue
  Cc: Bjorn Helgaas, Jonathan Cameron, chengyou, kaishen, yangyicong,
	baolin.wang, robin.murphy, linux-kernel, linux-arm-kernel,
	linux-pci, rdunlap, mark.rutland, zhuo.song

On Thu, Jul 27, 2023 at 11:45:22AM +0800, Shuai Xue wrote:
> 
> 
> On 2023/7/26 04:59, Bjorn Helgaas wrote:
> > On Mon, Jul 24, 2023 at 10:18:07AM +0100, Jonathan Cameron wrote:
> >> On Mon, 24 Jul 2023 10:34:08 +0800
> >> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
> >>> On 2023/7/10 20:04, Shuai Xue wrote:
> >>>> On 2023/6/16 16:39, Shuai Xue wrote:  
> >>>>> On 2023/6/6 15:49, Shuai Xue wrote:  
> > 
> >>>>>> This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
> >>>>>> for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
> >>>>>> Core controller IP which provides statistics feature.
> > 
> >> ...
> >> Really a question for Bjorn I think, but here is my 2 cents...
> >>
> >> The problem here is that we need to do that fundamental redesign of the
> >> way the PCI ports drivers work.  I'm not sure there is a path to merging
> >> this until that is done.  The bigger problem is that I'm not sure anyone
> >> is actively looking at that yet.  I'd like to look at this (as I have
> >> the same problem for some other drivers), but it is behind various
> >> other things on my todo list.
> >>
> >> Bjorn might be persuaded on a temporary solution, but that would come
> >> with some maintenance problems, particularly when we try to do it
> >> 'right' in the future.  Maybe adding another service driver would be
> >> a stop gap as long as we know we won't keep doing so for ever. Not sure.
> > 
> > I think the question here is around the for_each_pci_dev() in
> > __dwc_pcie_pmu_probe()?  I don't *like* that because of the
> > assumptions it breaks (autoload doesn't work, hotplug doesn't work),
> > but:
> > 
> >   - There are several other drivers that also do this,
> >   - I don't have a better suggest for any of them,
> >   - It's not a drivers/pci thing, so not really up to me anyway,
> > 
> > so I don't have any problem with this being merged as-is, as long as
> > you can live with the limitations.
> > 
> > I don't think this series does anything to work around those
> > limitations, i.e., it doesn't make up fake device IDs for module
> > loading or fake events for hotplug, so it seems like we could improve
> > the implementation later if we ever have a way to do it.
> > 
> > Bjorn
> 
> + Will
> 
> Ok, thank you for confirmation, Bjorn. Then it comes to perf driver parts and
> it is really a question for @Will I think.
> 
> What's your opinion about merging this patch set, @Will?

No fundamental objection from me, but I'll have a closer look when you
post a version addressing the feedback from Jonathan and Yicong.

Cheers,

Will

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-07-28 12:41       ` Shuai Xue
@ 2023-07-28 15:20         ` Jonathan Cameron
  2023-08-01 11:46         ` Yicong Yang
  1 sibling, 0 replies; 31+ messages in thread
From: Jonathan Cameron @ 2023-07-28 15:20 UTC (permalink / raw)
  To: Shuai Xue
  Cc: Yicong Yang, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, yangyicong, linux-kernel, linux-arm-kernel,
	linux-pci, rdunlap, mark.rutland, zhuo.song

...


> >>> +
> >>> +static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
> >>> +{
> >>> +	struct dwc_pcie_pmu *pcie_pmu;
> >>> +	struct pci_dev *pdev;
> >>> +	int node;
> >>> +
> >>> +	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
> >>> +	pdev = pcie_pmu->pdev;
> >>> +	node = dev_to_node(&pdev->dev);
> >>> +
> >>> +	if (node != NUMA_NO_NODE && cpu_to_node(pcie_pmu->oncpu) != node &&  
> > 
> > Perhaps worth a comment on when you might see node == NUMA_NO_NODE?
> > Beyond NUMA being entirely disabled, I'd hope that never happens and for that you
> > might be able to use a compile time check.
> > 
> > I wonder if this can be simplified by a flag that says if we are already in the
> > right node? Might be easier to follow than having similar dance in online and offline
> > to figure that out.  
> 
> Ok, I will add a comment for NUMA_NO_NODE. If no numa support, I think
> any CPU is fine to bind.

Agreed. I would add a comment on that being the intent.

> 
> pcie_pmu->on_cpu may be a good choise to be used as a flag, right? pcie_pmu->on_cpu
> will be set as -1 when pcie_pmu is allocated and then check in
> dwc_pcie_pmu_online_cpu() first.

I think you still want to know whether it's in the right node - as maybe
there are no local CPUs available at startup.

> 
> Then, the code will be:
> 
> static int dwc_pcie_pmu_online_cpu(unsigned int cpu, struct hlist_node *cpuhp_node)
> {
> 	struct dwc_pcie_pmu *pcie_pmu;
> 	struct pci_dev *pdev;
> 	int node;
> 
> 	pcie_pmu = hlist_entry_safe(cpuhp_node, struct dwc_pcie_pmu, cpuhp_node);
> 	/* If another CPU is already managing this PMU, simply return. */
> 	if (pcie_pmu->on_cpu != -1)
> 		return 0;
> 
> 	pdev = pcie_pmu->pdev;
> 	node = dev_to_node(&pdev->dev);
> 
> 	/* Select the first CPU if no numa support. */
> 	if (node == NUMA_NO_NODE)
> 		pcie_pmu->on_cpu = cpu;
> 	else if (cpu_to_node(pcie_pmu->on_cpu) != node &&
> 		 cpu_to_node(cpu) == node)
> 		dwc_pcie_pmu_migrate(pcie_pmu, cpu);
> 
> 	return 0;
> }
> > 
> >>> +static int __init dwc_pcie_pmu_init(void)
> >>> +{
> >>> +	int ret;
> >>> +
> >>> +	ret = cpuhp_setup_state_multi(CPUHP_AP_ONLINE_DYN,
> >>> +				      "perf/dwc_pcie_pmu:online",
> >>> +				      dwc_pcie_pmu_online_cpu,
> >>> +				      dwc_pcie_pmu_offline_cpu);
> >>> +	if (ret < 0)
> >>> +		return ret;
> >>> +
> >>> +	dwc_pcie_pmu_hp_state = ret;
> >>> +
> >>> +	ret = platform_driver_register(&dwc_pcie_pmu_driver);
> >>> +	if (ret) {
> >>> +		cpuhp_remove_multi_state(dwc_pcie_pmu_hp_state);
> >>> +		return ret;
> >>> +	}
> >>> +
> >>> +	dwc_pcie_pmu_dev = platform_device_register_simple(
> >>> +				"dwc_pcie_pmu", PLATFORM_DEVID_NONE, NULL, 0);
> >>> +	if (IS_ERR(dwc_pcie_pmu_dev)) {
> >>> +		platform_driver_unregister(&dwc_pcie_pmu_driver);    
> >>
> >> On failure we also need to remove cpuhp state as well.  
> > 
> > I'd suggest using gotos and a single error handling block. That
> > makes it both harder to forget things like this and easier to
> > compare that block with what happens in exit() - so slightly 
> > easier to review!  
> 
> Given that we have a appropriate way to tear down the PMUs via devm_add_action_or_reset(),
> I am going to remove the redundant probe/remove framework via platform_driver_{un}register().
> for_each probe process in __dwc_pcie_pmu_probe() will be move into dwc_pcie_pmu_init().
> Is it a better way?

I think I'd prefer to see a standard driver creation / probe flow even if you could in theory
avoid it.

Jonathan

> 
> Thank you very much for your valuable comments.
> 
> Best Regards,
> Shuai
> 
> 


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support
  2023-07-28 13:39             ` Will Deacon
@ 2023-07-31  7:30               ` Shuai Xue
  0 siblings, 0 replies; 31+ messages in thread
From: Shuai Xue @ 2023-07-31  7:30 UTC (permalink / raw)
  To: Will Deacon
  Cc: Bjorn Helgaas, Jonathan Cameron, chengyou, kaishen, yangyicong,
	baolin.wang, robin.murphy, linux-kernel, linux-arm-kernel,
	linux-pci, rdunlap, mark.rutland, zhuo.song



On 2023/7/28 21:39, Will Deacon wrote:
> On Thu, Jul 27, 2023 at 11:45:22AM +0800, Shuai Xue wrote:
>>
>>
>> On 2023/7/26 04:59, Bjorn Helgaas wrote:
>>> On Mon, Jul 24, 2023 at 10:18:07AM +0100, Jonathan Cameron wrote:
>>>> On Mon, 24 Jul 2023 10:34:08 +0800
>>>> Shuai Xue <xueshuai@linux.alibaba.com> wrote:
>>>>> On 2023/7/10 20:04, Shuai Xue wrote:
>>>>>> On 2023/6/16 16:39, Shuai Xue wrote:  
>>>>>>> On 2023/6/6 15:49, Shuai Xue wrote:  
>>>
>>>>>>>> This patchset adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>>>>>> for T-Head Yitian 710 SoC chip. Yitian 710 is based on the Synopsys PCI Express
>>>>>>>> Core controller IP which provides statistics feature.
>>>
>>>> ...
>>>> Really a question for Bjorn I think, but here is my 2 cents...
>>>>
>>>> The problem here is that we need to do that fundamental redesign of the
>>>> way the PCI ports drivers work.  I'm not sure there is a path to merging
>>>> this until that is done.  The bigger problem is that I'm not sure anyone
>>>> is actively looking at that yet.  I'd like to look at this (as I have
>>>> the same problem for some other drivers), but it is behind various
>>>> other things on my todo list.
>>>>
>>>> Bjorn might be persuaded on a temporary solution, but that would come
>>>> with some maintenance problems, particularly when we try to do it
>>>> 'right' in the future.  Maybe adding another service driver would be
>>>> a stop gap as long as we know we won't keep doing so for ever. Not sure.
>>>
>>> I think the question here is around the for_each_pci_dev() in
>>> __dwc_pcie_pmu_probe()?  I don't *like* that because of the
>>> assumptions it breaks (autoload doesn't work, hotplug doesn't work),
>>> but:
>>>
>>>   - There are several other drivers that also do this,
>>>   - I don't have a better suggest for any of them,
>>>   - It's not a drivers/pci thing, so not really up to me anyway,
>>>
>>> so I don't have any problem with this being merged as-is, as long as
>>> you can live with the limitations.
>>>
>>> I don't think this series does anything to work around those
>>> limitations, i.e., it doesn't make up fake device IDs for module
>>> loading or fake events for hotplug, so it seems like we could improve
>>> the implementation later if we ever have a way to do it.
>>>
>>> Bjorn
>>
>> + Will
>>
>> Ok, thank you for confirmation, Bjorn. Then it comes to perf driver parts and
>> it is really a question for @Will I think.
>>
>> What's your opinion about merging this patch set, @Will?
> 
> No fundamental objection from me, but I'll have a closer look when you
> post a version addressing the feedback from Jonathan and Yicong.

Thanks for your input! I appreciate that you don't have any fundamental objections
to merging the patch set. I'll definitely take into account the feedback from Jonathan
and Yicong before posting a revised version.


Best Regards,
Cheers.
Shuai

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-07-28 12:41       ` Shuai Xue
  2023-07-28 15:20         ` Jonathan Cameron
@ 2023-08-01 11:46         ` Yicong Yang
  2023-08-04  1:39           ` Shuai Xue
  1 sibling, 1 reply; 31+ messages in thread
From: Yicong Yang @ 2023-08-01 11:46 UTC (permalink / raw)
  To: Shuai Xue, Jonathan Cameron
  Cc: yangyicong, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song

On 2023/7/28 20:41, Shuai Xue wrote:
> 
> 
> On 2023/7/27 17:39, Jonathan Cameron wrote:
>> On Tue, 6 Jun 2023 23:14:07 +0800
>> Yicong Yang <yangyicong@huawei.com> wrote:
>>
>>> On 2023/6/6 15:49, Shuai Xue wrote:
>>>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>>>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>>>> Root Complex integrated End Point(RCiEP) device but only register counters
>>>> provided by each PCIe Root Port.
>>>>
>>>> To facilitate collection of statistics the controller provides the
>>>> following two features for each Root Port:
>>>>
>>>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>>>   low-power LTSSM state)
>>>> - Event counters (Error and Non-Error for lanes)
>>>>
>>>> Note, only one counter for each type and does not overflow interrupt.
>>>>
>>>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>>>> named based the BDF of Root Port. For example,
>>>>
>>>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>>>
>>>> the PMU device name for this Root Port is dwc_rootport_3018.
>>>>
>>>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>>>
>>>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>>>
>>>> average RX bandwidth can be calculated like this:
>>>>
>>>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>>>
>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>
>> I'll review on top to avoid any duplication with Yicong.
> 
> Thank you! It also served as a reminder that I missed Yicong's email. It appears
> that Thunderbird mistakenly moved his email to the junk folder, resulting in me
> overlooking it.
> 
>>
>> Note I've cropped the stuff neither of us commented on so it's
>> easier to spot the feedback.
> 
> Thank you for noting that. My feedback is replied inline.
> 
>>
>> Jonathan
>>
>>>> ---
>>>>  drivers/perf/Kconfig        |   7 +
>>>>  drivers/perf/Makefile       |   1 +
>>>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>>>  3 files changed, 714 insertions(+)
>>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>>
>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>> index 711f82400086..6ff3921d7a62 100644
>>>> --- a/drivers/perf/Kconfig
>>>> +++ b/drivers/perf/Kconfig
>>>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>>>  	  Enable perf support for Marvell DDR Performance monitoring
>>>>  	  event on CN10K platform.
>>>>  
>>>> +config DWC_PCIE_PMU
>>>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>>>> +	depends on (ARM64 && PCI)
>>>> +	help
>>>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>>>> +	  monitoring event on Yitian 710 platform.
>>
>> The documentation kind of implies this isn't platform specific.
>> If some parts are (such as which events exist) then you may want to push
>> that to userspace / perftool with appropriate matching against specific SoC.
>>
>> If it is generic, then change this text to "event on platform including the Yitian 710."
> 
> It is generic without any platform specific, so I will change it as you expected.
> 
>>
>>>> +
>>>>  source "drivers/perf/arm_cspmu/Kconfig"
>>>>  
>>>>  source "drivers/perf/amlogic/Kconfig"
>>
>>>> new file mode 100644
>>>> index 000000000000..8bfcf6e0662d
>>>> --- /dev/null
>>>> +++ b/drivers/perf/dwc_pcie_pmu.c
>>>> @@ -0,0 +1,706 @@
>>
>> ...
>>
>>>> +
>>>> +struct dwc_pcie_pmu {
>>>> +	struct pci_dev		*pdev;		/* Root Port device */  
>>>
>>> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
>>> pointer. I didn't see you hold the root port to avoid the removal.
> 
> Do you mean that I should have a reference count of rootport by pci_dev_get() when allocating
> pcie_pmu?
> 
>      pcie_pmu->pdev = pci_dev_get();

It could be one option, but will block the removal of device from userspace. Another option
is to register a PCI bus notifier then on removal/added the driver can get notified and handle
it, for example, remove the related PMU on the removal of the root ports.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-08-01 11:46         ` Yicong Yang
@ 2023-08-04  1:39           ` Shuai Xue
  2023-08-04  2:28             ` Yicong Yang
  0 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-08-04  1:39 UTC (permalink / raw)
  To: Yicong Yang, Jonathan Cameron
  Cc: yangyicong, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song



On 2023/8/1 19:46, Yicong Yang wrote:
> On 2023/7/28 20:41, Shuai Xue wrote:
>>
>>
>> On 2023/7/27 17:39, Jonathan Cameron wrote:
>>> On Tue, 6 Jun 2023 23:14:07 +0800
>>> Yicong Yang <yangyicong@huawei.com> wrote:
>>>
>>>> On 2023/6/6 15:49, Shuai Xue wrote:
>>>>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>>>>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>>>>> Root Complex integrated End Point(RCiEP) device but only register counters
>>>>> provided by each PCIe Root Port.
>>>>>
>>>>> To facilitate collection of statistics the controller provides the
>>>>> following two features for each Root Port:
>>>>>
>>>>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>>>>   low-power LTSSM state)
>>>>> - Event counters (Error and Non-Error for lanes)
>>>>>
>>>>> Note, only one counter for each type and does not overflow interrupt.
>>>>>
>>>>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>>>>> named based the BDF of Root Port. For example,
>>>>>
>>>>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>>>>
>>>>> the PMU device name for this Root Port is dwc_rootport_3018.
>>>>>
>>>>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>>>>
>>>>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>>>>
>>>>> average RX bandwidth can be calculated like this:
>>>>>
>>>>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>>>>
>>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>
>>> I'll review on top to avoid any duplication with Yicong.
>>
>> Thank you! It also served as a reminder that I missed Yicong's email. It appears
>> that Thunderbird mistakenly moved his email to the junk folder, resulting in me
>> overlooking it.
>>
>>>
>>> Note I've cropped the stuff neither of us commented on so it's
>>> easier to spot the feedback.
>>
>> Thank you for noting that. My feedback is replied inline.
>>
>>>
>>> Jonathan
>>>
>>>>> ---
>>>>>  drivers/perf/Kconfig        |   7 +
>>>>>  drivers/perf/Makefile       |   1 +
>>>>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>>>>  3 files changed, 714 insertions(+)
>>>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>>>
>>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>>> index 711f82400086..6ff3921d7a62 100644
>>>>> --- a/drivers/perf/Kconfig
>>>>> +++ b/drivers/perf/Kconfig
>>>>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>>>>  	  Enable perf support for Marvell DDR Performance monitoring
>>>>>  	  event on CN10K platform.
>>>>>  
>>>>> +config DWC_PCIE_PMU
>>>>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>>>>> +	depends on (ARM64 && PCI)
>>>>> +	help
>>>>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>>>>> +	  monitoring event on Yitian 710 platform.
>>>
>>> The documentation kind of implies this isn't platform specific.
>>> If some parts are (such as which events exist) then you may want to push
>>> that to userspace / perftool with appropriate matching against specific SoC.
>>>
>>> If it is generic, then change this text to "event on platform including the Yitian 710."
>>
>> It is generic without any platform specific, so I will change it as you expected.
>>
>>>
>>>>> +
>>>>>  source "drivers/perf/arm_cspmu/Kconfig"
>>>>>  
>>>>>  source "drivers/perf/amlogic/Kconfig"
>>>
>>>>> new file mode 100644
>>>>> index 000000000000..8bfcf6e0662d
>>>>> --- /dev/null
>>>>> +++ b/drivers/perf/dwc_pcie_pmu.c
>>>>> @@ -0,0 +1,706 @@
>>>
>>> ...
>>>
>>>>> +
>>>>> +struct dwc_pcie_pmu {
>>>>> +	struct pci_dev		*pdev;		/* Root Port device */  
>>>>
>>>> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
>>>> pointer. I didn't see you hold the root port to avoid the removal.
>>
>> Do you mean that I should have a reference count of rootport by pci_dev_get() when allocating
>> pcie_pmu?
>>
>>      pcie_pmu->pdev = pci_dev_get();
> 
> It could be one option, but will block the removal of device from userspace. Another option
> is to register a PCI bus notifier then on removal/added the driver can get notified and handle
> it, for example, remove the related PMU on the removal of the root ports.

I see, but can root port be removed from userspace? I check the hotplug slot interface, no root
port is available to power off.

Thank you.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-08-04  1:39           ` Shuai Xue
@ 2023-08-04  2:28             ` Yicong Yang
  2023-08-04  3:09               ` Shuai Xue
  0 siblings, 1 reply; 31+ messages in thread
From: Yicong Yang @ 2023-08-04  2:28 UTC (permalink / raw)
  To: Shuai Xue, Jonathan Cameron
  Cc: yangyicong, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song

On 2023/8/4 9:39, Shuai Xue wrote:
> 
> 
> On 2023/8/1 19:46, Yicong Yang wrote:
>> On 2023/7/28 20:41, Shuai Xue wrote:
>>>
>>>
>>> On 2023/7/27 17:39, Jonathan Cameron wrote:
>>>> On Tue, 6 Jun 2023 23:14:07 +0800
>>>> Yicong Yang <yangyicong@huawei.com> wrote:
>>>>
>>>>> On 2023/6/6 15:49, Shuai Xue wrote:
>>>>>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>>>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>>>>>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>>>>>> Root Complex integrated End Point(RCiEP) device but only register counters
>>>>>> provided by each PCIe Root Port.
>>>>>>
>>>>>> To facilitate collection of statistics the controller provides the
>>>>>> following two features for each Root Port:
>>>>>>
>>>>>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>>>>>   low-power LTSSM state)
>>>>>> - Event counters (Error and Non-Error for lanes)
>>>>>>
>>>>>> Note, only one counter for each type and does not overflow interrupt.
>>>>>>
>>>>>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>>>>>> named based the BDF of Root Port. For example,
>>>>>>
>>>>>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>>>>>
>>>>>> the PMU device name for this Root Port is dwc_rootport_3018.
>>>>>>
>>>>>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>>>>>
>>>>>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>>>>>
>>>>>> average RX bandwidth can be calculated like this:
>>>>>>
>>>>>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>>>>>
>>>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>>>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>
>>>> I'll review on top to avoid any duplication with Yicong.
>>>
>>> Thank you! It also served as a reminder that I missed Yicong's email. It appears
>>> that Thunderbird mistakenly moved his email to the junk folder, resulting in me
>>> overlooking it.
>>>
>>>>
>>>> Note I've cropped the stuff neither of us commented on so it's
>>>> easier to spot the feedback.
>>>
>>> Thank you for noting that. My feedback is replied inline.
>>>
>>>>
>>>> Jonathan
>>>>
>>>>>> ---
>>>>>>  drivers/perf/Kconfig        |   7 +
>>>>>>  drivers/perf/Makefile       |   1 +
>>>>>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>>>>>  3 files changed, 714 insertions(+)
>>>>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>>>>
>>>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>>>> index 711f82400086..6ff3921d7a62 100644
>>>>>> --- a/drivers/perf/Kconfig
>>>>>> +++ b/drivers/perf/Kconfig
>>>>>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>>>>>  	  Enable perf support for Marvell DDR Performance monitoring
>>>>>>  	  event on CN10K platform.
>>>>>>  
>>>>>> +config DWC_PCIE_PMU
>>>>>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>>>>>> +	depends on (ARM64 && PCI)
>>>>>> +	help
>>>>>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>>>>>> +	  monitoring event on Yitian 710 platform.
>>>>
>>>> The documentation kind of implies this isn't platform specific.
>>>> If some parts are (such as which events exist) then you may want to push
>>>> that to userspace / perftool with appropriate matching against specific SoC.
>>>>
>>>> If it is generic, then change this text to "event on platform including the Yitian 710."
>>>
>>> It is generic without any platform specific, so I will change it as you expected.
>>>
>>>>
>>>>>> +
>>>>>>  source "drivers/perf/arm_cspmu/Kconfig"
>>>>>>  
>>>>>>  source "drivers/perf/amlogic/Kconfig"
>>>>
>>>>>> new file mode 100644
>>>>>> index 000000000000..8bfcf6e0662d
>>>>>> --- /dev/null
>>>>>> +++ b/drivers/perf/dwc_pcie_pmu.c
>>>>>> @@ -0,0 +1,706 @@
>>>>
>>>> ...
>>>>
>>>>>> +
>>>>>> +struct dwc_pcie_pmu {
>>>>>> +	struct pci_dev		*pdev;		/* Root Port device */  
>>>>>
>>>>> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
>>>>> pointer. I didn't see you hold the root port to avoid the removal.
>>>
>>> Do you mean that I should have a reference count of rootport by pci_dev_get() when allocating
>>> pcie_pmu?
>>>
>>>      pcie_pmu->pdev = pci_dev_get();
>>
>> It could be one option, but will block the removal of device from userspace. Another option
>> is to register a PCI bus notifier then on removal/added the driver can get notified and handle
>> it, for example, remove the related PMU on the removal of the root ports.
> 
> I see, but can root port be removed from userspace? I check the hotplug slot interface, no root
> port is available to power off.
> 

For hotplug maybe not, but user can remove certian device through sysfs:

echo 1 > /sys/bus/pci/devices/<root port>/remove

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-08-04  2:28             ` Yicong Yang
@ 2023-08-04  3:09               ` Shuai Xue
  2023-10-09 13:08                 ` Shuai Xue
  0 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-08-04  3:09 UTC (permalink / raw)
  To: Yicong Yang, Jonathan Cameron
  Cc: yangyicong, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song



On 2023/8/4 10:28, Yicong Yang wrote:
> On 2023/8/4 9:39, Shuai Xue wrote:
>>
>>
>> On 2023/8/1 19:46, Yicong Yang wrote:
>>> On 2023/7/28 20:41, Shuai Xue wrote:
>>>>
>>>>
>>>> On 2023/7/27 17:39, Jonathan Cameron wrote:
>>>>> On Tue, 6 Jun 2023 23:14:07 +0800
>>>>> Yicong Yang <yangyicong@huawei.com> wrote:
>>>>>
>>>>>> On 2023/6/6 15:49, Shuai Xue wrote:
>>>>>>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>>>>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>>>>>>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>>>>>>> Root Complex integrated End Point(RCiEP) device but only register counters
>>>>>>> provided by each PCIe Root Port.
>>>>>>>
>>>>>>> To facilitate collection of statistics the controller provides the
>>>>>>> following two features for each Root Port:
>>>>>>>
>>>>>>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>>>>>>   low-power LTSSM state)
>>>>>>> - Event counters (Error and Non-Error for lanes)
>>>>>>>
>>>>>>> Note, only one counter for each type and does not overflow interrupt.
>>>>>>>
>>>>>>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>>>>>>> named based the BDF of Root Port. For example,
>>>>>>>
>>>>>>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>>>>>>
>>>>>>> the PMU device name for this Root Port is dwc_rootport_3018.
>>>>>>>
>>>>>>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>>>>>>
>>>>>>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>>>>>>
>>>>>>> average RX bandwidth can be calculated like this:
>>>>>>>
>>>>>>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>>>>>>
>>>>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>>>>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>
>>>>> I'll review on top to avoid any duplication with Yicong.
>>>>
>>>> Thank you! It also served as a reminder that I missed Yicong's email. It appears
>>>> that Thunderbird mistakenly moved his email to the junk folder, resulting in me
>>>> overlooking it.
>>>>
>>>>>
>>>>> Note I've cropped the stuff neither of us commented on so it's
>>>>> easier to spot the feedback.
>>>>
>>>> Thank you for noting that. My feedback is replied inline.
>>>>
>>>>>
>>>>> Jonathan
>>>>>
>>>>>>> ---
>>>>>>>  drivers/perf/Kconfig        |   7 +
>>>>>>>  drivers/perf/Makefile       |   1 +
>>>>>>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>>>>>>  3 files changed, 714 insertions(+)
>>>>>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>>>>>
>>>>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>>>>> index 711f82400086..6ff3921d7a62 100644
>>>>>>> --- a/drivers/perf/Kconfig
>>>>>>> +++ b/drivers/perf/Kconfig
>>>>>>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>>>>>>  	  Enable perf support for Marvell DDR Performance monitoring
>>>>>>>  	  event on CN10K platform.
>>>>>>>  
>>>>>>> +config DWC_PCIE_PMU
>>>>>>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>>>>>>> +	depends on (ARM64 && PCI)
>>>>>>> +	help
>>>>>>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>>>>>>> +	  monitoring event on Yitian 710 platform.
>>>>>
>>>>> The documentation kind of implies this isn't platform specific.
>>>>> If some parts are (such as which events exist) then you may want to push
>>>>> that to userspace / perftool with appropriate matching against specific SoC.
>>>>>
>>>>> If it is generic, then change this text to "event on platform including the Yitian 710."
>>>>
>>>> It is generic without any platform specific, so I will change it as you expected.
>>>>
>>>>>
>>>>>>> +
>>>>>>>  source "drivers/perf/arm_cspmu/Kconfig"
>>>>>>>  
>>>>>>>  source "drivers/perf/amlogic/Kconfig"
>>>>>
>>>>>>> new file mode 100644
>>>>>>> index 000000000000..8bfcf6e0662d
>>>>>>> --- /dev/null
>>>>>>> +++ b/drivers/perf/dwc_pcie_pmu.c
>>>>>>> @@ -0,0 +1,706 @@
>>>>>
>>>>> ...
>>>>>
>>>>>>> +
>>>>>>> +struct dwc_pcie_pmu {
>>>>>>> +	struct pci_dev		*pdev;		/* Root Port device */  
>>>>>>
>>>>>> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
>>>>>> pointer. I didn't see you hold the root port to avoid the removal.
>>>>
>>>> Do you mean that I should have a reference count of rootport by pci_dev_get() when allocating
>>>> pcie_pmu?
>>>>
>>>>      pcie_pmu->pdev = pci_dev_get();
>>>
>>> It could be one option, but will block the removal of device from userspace. Another option
>>> is to register a PCI bus notifier then on removal/added the driver can get notified and handle
>>> it, for example, remove the related PMU on the removal of the root ports.
>>
>> I see, but can root port be removed from userspace? I check the hotplug slot interface, no root
>> port is available to power off.
>>
> 
> For hotplug maybe not, but user can remove certian device through sysfs:
> 
> echo 1 > /sys/bus/pci/devices/<root port>/remove
> 

Thank you, I will add a notifier for removal/added action.

Best Regards,
Shuai

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-08-04  3:09               ` Shuai Xue
@ 2023-10-09 13:08                 ` Shuai Xue
  2023-10-10  7:35                   ` Yicong Yang
  0 siblings, 1 reply; 31+ messages in thread
From: Shuai Xue @ 2023-10-09 13:08 UTC (permalink / raw)
  To: Yicong Yang, Jonathan Cameron
  Cc: yangyicong, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song



On 2023/8/4 11:09, Shuai Xue wrote:
> 
> 
> On 2023/8/4 10:28, Yicong Yang wrote:
>> On 2023/8/4 9:39, Shuai Xue wrote:
>>>
>>>
>>> On 2023/8/1 19:46, Yicong Yang wrote:
>>>> On 2023/7/28 20:41, Shuai Xue wrote:
>>>>>
>>>>>
>>>>> On 2023/7/27 17:39, Jonathan Cameron wrote:
>>>>>> On Tue, 6 Jun 2023 23:14:07 +0800
>>>>>> Yicong Yang <yangyicong@huawei.com> wrote:
>>>>>>
>>>>>>> On 2023/6/6 15:49, Shuai Xue wrote:
>>>>>>>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>>>>>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>>>>>>>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>>>>>>>> Root Complex integrated End Point(RCiEP) device but only register counters
>>>>>>>> provided by each PCIe Root Port.
>>>>>>>>
>>>>>>>> To facilitate collection of statistics the controller provides the
>>>>>>>> following two features for each Root Port:
>>>>>>>>
>>>>>>>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>>>>>>>   low-power LTSSM state)
>>>>>>>> - Event counters (Error and Non-Error for lanes)
>>>>>>>>
>>>>>>>> Note, only one counter for each type and does not overflow interrupt.
>>>>>>>>
>>>>>>>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>>>>>>>> named based the BDF of Root Port. For example,
>>>>>>>>
>>>>>>>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>>>>>>>
>>>>>>>> the PMU device name for this Root Port is dwc_rootport_3018.
>>>>>>>>
>>>>>>>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>>>>>>>
>>>>>>>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>>>>>>>
>>>>>>>> average RX bandwidth can be calculated like this:
>>>>>>>>
>>>>>>>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>>>>>>>
>>>>>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>>>>>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>
>>>>>> I'll review on top to avoid any duplication with Yicong.
>>>>>
>>>>> Thank you! It also served as a reminder that I missed Yicong's email. It appears
>>>>> that Thunderbird mistakenly moved his email to the junk folder, resulting in me
>>>>> overlooking it.
>>>>>
>>>>>>
>>>>>> Note I've cropped the stuff neither of us commented on so it's
>>>>>> easier to spot the feedback.
>>>>>
>>>>> Thank you for noting that. My feedback is replied inline.
>>>>>
>>>>>>
>>>>>> Jonathan
>>>>>>
>>>>>>>> ---
>>>>>>>>  drivers/perf/Kconfig        |   7 +
>>>>>>>>  drivers/perf/Makefile       |   1 +
>>>>>>>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>>>>>>>  3 files changed, 714 insertions(+)
>>>>>>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>>>>>>
>>>>>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>>>>>> index 711f82400086..6ff3921d7a62 100644
>>>>>>>> --- a/drivers/perf/Kconfig
>>>>>>>> +++ b/drivers/perf/Kconfig
>>>>>>>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>>>>>>>  	  Enable perf support for Marvell DDR Performance monitoring
>>>>>>>>  	  event on CN10K platform.
>>>>>>>>  
>>>>>>>> +config DWC_PCIE_PMU
>>>>>>>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>>>>>>>> +	depends on (ARM64 && PCI)
>>>>>>>> +	help
>>>>>>>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>>>>>>>> +	  monitoring event on Yitian 710 platform.
>>>>>>
>>>>>> The documentation kind of implies this isn't platform specific.
>>>>>> If some parts are (such as which events exist) then you may want to push
>>>>>> that to userspace / perftool with appropriate matching against specific SoC.
>>>>>>
>>>>>> If it is generic, then change this text to "event on platform including the Yitian 710."
>>>>>
>>>>> It is generic without any platform specific, so I will change it as you expected.
>>>>>
>>>>>>
>>>>>>>> +
>>>>>>>>  source "drivers/perf/arm_cspmu/Kconfig"
>>>>>>>>  
>>>>>>>>  source "drivers/perf/amlogic/Kconfig"
>>>>>>
>>>>>>>> new file mode 100644
>>>>>>>> index 000000000000..8bfcf6e0662d
>>>>>>>> --- /dev/null
>>>>>>>> +++ b/drivers/perf/dwc_pcie_pmu.c
>>>>>>>> @@ -0,0 +1,706 @@
>>>>>>
>>>>>> ...
>>>>>>
>>>>>>>> +
>>>>>>>> +struct dwc_pcie_pmu {
>>>>>>>> +	struct pci_dev		*pdev;		/* Root Port device */  
>>>>>>>
>>>>>>> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
>>>>>>> pointer. I didn't see you hold the root port to avoid the removal.
>>>>>
>>>>> Do you mean that I should have a reference count of rootport by pci_dev_get() when allocating
>>>>> pcie_pmu?
>>>>>
>>>>>      pcie_pmu->pdev = pci_dev_get();
>>>>
>>>> It could be one option, but will block the removal of device from userspace. Another option
>>>> is to register a PCI bus notifier then on removal/added the driver can get notified and handle
>>>> it, for example, remove the related PMU on the removal of the root ports.
>>>
>>> I see, but can root port be removed from userspace? I check the hotplug slot interface, no root
>>> port is available to power off.
>>>
>>
>> For hotplug maybe not, but user can remove certian device through sysfs:
>>
>> echo 1 > /sys/bus/pci/devices/<root port>/remove
>>
> 
> Thank you, I will add a notifier for removal/added action.
> 
> Best Regards,
> Shuai

Hi, Yicong,

I am confused when adding a notifier by bus_register_notifier(). If I have added a action to
pdev->dev to unregister pmu:

		ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
		if (ret) {
			pci_err(pcie_pmu->pdev,
				"Error %d registering PMU @%x\n", ret, bdf);
			goto out;
		}
		ret = devm_add_action_or_reset(
			&pdev->dev, dwc_pcie_pmu_unregister_pmu, &pcie_pmu->pmu);

the pmu will be unregister when the port removes, so accessing the NULL pointer will never happen,
right? Do we still need the bus_register_notifier()?

Thank you.

Best Regards,
Shuai


^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-10-09 13:08                 ` Shuai Xue
@ 2023-10-10  7:35                   ` Yicong Yang
  2023-10-10  7:45                     ` Shuai Xue
  0 siblings, 1 reply; 31+ messages in thread
From: Yicong Yang @ 2023-10-10  7:35 UTC (permalink / raw)
  To: Shuai Xue, Jonathan Cameron
  Cc: yangyicong, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song

On 2023/10/9 21:08, Shuai Xue wrote:
> 
> 
> On 2023/8/4 11:09, Shuai Xue wrote:
>>
>>
>> On 2023/8/4 10:28, Yicong Yang wrote:
>>> On 2023/8/4 9:39, Shuai Xue wrote:
>>>>
>>>>
>>>> On 2023/8/1 19:46, Yicong Yang wrote:
>>>>> On 2023/7/28 20:41, Shuai Xue wrote:
>>>>>>
>>>>>>
>>>>>> On 2023/7/27 17:39, Jonathan Cameron wrote:
>>>>>>> On Tue, 6 Jun 2023 23:14:07 +0800
>>>>>>> Yicong Yang <yangyicong@huawei.com> wrote:
>>>>>>>
>>>>>>>> On 2023/6/6 15:49, Shuai Xue wrote:
>>>>>>>>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>>>>>>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>>>>>>>>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>>>>>>>>> Root Complex integrated End Point(RCiEP) device but only register counters
>>>>>>>>> provided by each PCIe Root Port.
>>>>>>>>>
>>>>>>>>> To facilitate collection of statistics the controller provides the
>>>>>>>>> following two features for each Root Port:
>>>>>>>>>
>>>>>>>>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>>>>>>>>   low-power LTSSM state)
>>>>>>>>> - Event counters (Error and Non-Error for lanes)
>>>>>>>>>
>>>>>>>>> Note, only one counter for each type and does not overflow interrupt.
>>>>>>>>>
>>>>>>>>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>>>>>>>>> named based the BDF of Root Port. For example,
>>>>>>>>>
>>>>>>>>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>>>>>>>>
>>>>>>>>> the PMU device name for this Root Port is dwc_rootport_3018.
>>>>>>>>>
>>>>>>>>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>>>>>>>>
>>>>>>>>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>>>>>>>>
>>>>>>>>> average RX bandwidth can be calculated like this:
>>>>>>>>>
>>>>>>>>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>>>>>>>>
>>>>>>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>>>>>>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>>
>>>>>>> I'll review on top to avoid any duplication with Yicong.
>>>>>>
>>>>>> Thank you! It also served as a reminder that I missed Yicong's email. It appears
>>>>>> that Thunderbird mistakenly moved his email to the junk folder, resulting in me
>>>>>> overlooking it.
>>>>>>
>>>>>>>
>>>>>>> Note I've cropped the stuff neither of us commented on so it's
>>>>>>> easier to spot the feedback.
>>>>>>
>>>>>> Thank you for noting that. My feedback is replied inline.
>>>>>>
>>>>>>>
>>>>>>> Jonathan
>>>>>>>
>>>>>>>>> ---
>>>>>>>>>  drivers/perf/Kconfig        |   7 +
>>>>>>>>>  drivers/perf/Makefile       |   1 +
>>>>>>>>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>>>>>>>>  3 files changed, 714 insertions(+)
>>>>>>>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>>>>>>>
>>>>>>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>>>>>>> index 711f82400086..6ff3921d7a62 100644
>>>>>>>>> --- a/drivers/perf/Kconfig
>>>>>>>>> +++ b/drivers/perf/Kconfig
>>>>>>>>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>>>>>>>>  	  Enable perf support for Marvell DDR Performance monitoring
>>>>>>>>>  	  event on CN10K platform.
>>>>>>>>>  
>>>>>>>>> +config DWC_PCIE_PMU
>>>>>>>>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>>>>>>>>> +	depends on (ARM64 && PCI)
>>>>>>>>> +	help
>>>>>>>>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>>>>>>>>> +	  monitoring event on Yitian 710 platform.
>>>>>>>
>>>>>>> The documentation kind of implies this isn't platform specific.
>>>>>>> If some parts are (such as which events exist) then you may want to push
>>>>>>> that to userspace / perftool with appropriate matching against specific SoC.
>>>>>>>
>>>>>>> If it is generic, then change this text to "event on platform including the Yitian 710."
>>>>>>
>>>>>> It is generic without any platform specific, so I will change it as you expected.
>>>>>>
>>>>>>>
>>>>>>>>> +
>>>>>>>>>  source "drivers/perf/arm_cspmu/Kconfig"
>>>>>>>>>  
>>>>>>>>>  source "drivers/perf/amlogic/Kconfig"
>>>>>>>
>>>>>>>>> new file mode 100644
>>>>>>>>> index 000000000000..8bfcf6e0662d
>>>>>>>>> --- /dev/null
>>>>>>>>> +++ b/drivers/perf/dwc_pcie_pmu.c
>>>>>>>>> @@ -0,0 +1,706 @@
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>>> +
>>>>>>>>> +struct dwc_pcie_pmu {
>>>>>>>>> +	struct pci_dev		*pdev;		/* Root Port device */  
>>>>>>>>
>>>>>>>> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
>>>>>>>> pointer. I didn't see you hold the root port to avoid the removal.
>>>>>>
>>>>>> Do you mean that I should have a reference count of rootport by pci_dev_get() when allocating
>>>>>> pcie_pmu?
>>>>>>
>>>>>>      pcie_pmu->pdev = pci_dev_get();
>>>>>
>>>>> It could be one option, but will block the removal of device from userspace. Another option
>>>>> is to register a PCI bus notifier then on removal/added the driver can get notified and handle
>>>>> it, for example, remove the related PMU on the removal of the root ports.
>>>>
>>>> I see, but can root port be removed from userspace? I check the hotplug slot interface, no root
>>>> port is available to power off.
>>>>
>>>
>>> For hotplug maybe not, but user can remove certian device through sysfs:
>>>
>>> echo 1 > /sys/bus/pci/devices/<root port>/remove
>>>
>>
>> Thank you, I will add a notifier for removal/added action.
>>
>> Best Regards,
>> Shuai
> 
> Hi, Yicong,
> 
> I am confused when adding a notifier by bus_register_notifier(). If I have added a action to
> pdev->dev to unregister pmu:
> 
> 		ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
> 		if (ret) {
> 			pci_err(pcie_pmu->pdev,
> 				"Error %d registering PMU @%x\n", ret, bdf);
> 			goto out;
> 		}
> 		ret = devm_add_action_or_reset(
> 			&pdev->dev, dwc_pcie_pmu_unregister_pmu, &pcie_pmu->pmu);
> 
> the pmu will be unregister when the port removes, so accessing the NULL pointer will never happen,
> right? Do we still need the bus_register_notifier()?
> 

Not necessary, a notifier is used to notice the device removal and avoid dereferencing the
NULL pointer. If you find another way like above to avoid the issue it will be ok. Since
your pmu is 1:1 related to the root port, add a devm action to unregister the PMU on root
port removal looks better.

Thanks.

^ permalink raw reply	[flat|nested] 31+ messages in thread

* Re: [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver
  2023-10-10  7:35                   ` Yicong Yang
@ 2023-10-10  7:45                     ` Shuai Xue
  0 siblings, 0 replies; 31+ messages in thread
From: Shuai Xue @ 2023-10-10  7:45 UTC (permalink / raw)
  To: Yicong Yang, Jonathan Cameron
  Cc: yangyicong, chengyou, kaishen, helgaas, will, baolin.wang,
	robin.murphy, linux-kernel, linux-arm-kernel, linux-pci, rdunlap,
	mark.rutland, zhuo.song



On 2023/10/10 15:35, Yicong Yang wrote:
> On 2023/10/9 21:08, Shuai Xue wrote:
>>
>>
>> On 2023/8/4 11:09, Shuai Xue wrote:
>>>
>>>
>>> On 2023/8/4 10:28, Yicong Yang wrote:
>>>> On 2023/8/4 9:39, Shuai Xue wrote:
>>>>>
>>>>>
>>>>> On 2023/8/1 19:46, Yicong Yang wrote:
>>>>>> On 2023/7/28 20:41, Shuai Xue wrote:
>>>>>>>
>>>>>>>
>>>>>>> On 2023/7/27 17:39, Jonathan Cameron wrote:
>>>>>>>> On Tue, 6 Jun 2023 23:14:07 +0800
>>>>>>>> Yicong Yang <yangyicong@huawei.com> wrote:
>>>>>>>>
>>>>>>>>> On 2023/6/6 15:49, Shuai Xue wrote:
>>>>>>>>>> This commit adds the PCIe Performance Monitoring Unit (PMU) driver support
>>>>>>>>>> for T-Head Yitian SoC chip. Yitian is based on the Synopsys PCI Express
>>>>>>>>>> Core controller IP which provides statistics feature. The PMU is not a PCIe
>>>>>>>>>> Root Complex integrated End Point(RCiEP) device but only register counters
>>>>>>>>>> provided by each PCIe Root Port.
>>>>>>>>>>
>>>>>>>>>> To facilitate collection of statistics the controller provides the
>>>>>>>>>> following two features for each Root Port:
>>>>>>>>>>
>>>>>>>>>> - Time Based Analysis (RX/TX data throughput and time spent in each
>>>>>>>>>>   low-power LTSSM state)
>>>>>>>>>> - Event counters (Error and Non-Error for lanes)
>>>>>>>>>>
>>>>>>>>>> Note, only one counter for each type and does not overflow interrupt.
>>>>>>>>>>
>>>>>>>>>> This driver adds PMU devices for each PCIe Root Port. And the PMU device is
>>>>>>>>>> named based the BDF of Root Port. For example,
>>>>>>>>>>
>>>>>>>>>>     30:03.0 PCI bridge: Device 1ded:8000 (rev 01)
>>>>>>>>>>
>>>>>>>>>> the PMU device name for this Root Port is dwc_rootport_3018.
>>>>>>>>>>
>>>>>>>>>> Example usage of counting PCIe RX TLP data payload (Units of 16 bytes)::
>>>>>>>>>>
>>>>>>>>>>     $# perf stat -a -e dwc_rootport_3018/Rx_PCIe_TLP_Data_Payload/
>>>>>>>>>>
>>>>>>>>>> average RX bandwidth can be calculated like this:
>>>>>>>>>>
>>>>>>>>>>     PCIe TX Bandwidth = PCIE_TX_DATA * 16B / Measure_Time_Window
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Shuai Xue <xueshuai@linux.alibaba.com>
>>>>>>>>>> Reported-by: kernel test robot <lkp@intel.com>
>>>>>>>>>> Link: https://lore.kernel.org/oe-kbuild-all/202305170639.XU3djFZX-lkp@intel.com/
>>>>>>>>>> Reviewed-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>>>>>>>
>>>>>>>> I'll review on top to avoid any duplication with Yicong.
>>>>>>>
>>>>>>> Thank you! It also served as a reminder that I missed Yicong's email. It appears
>>>>>>> that Thunderbird mistakenly moved his email to the junk folder, resulting in me
>>>>>>> overlooking it.
>>>>>>>
>>>>>>>>
>>>>>>>> Note I've cropped the stuff neither of us commented on so it's
>>>>>>>> easier to spot the feedback.
>>>>>>>
>>>>>>> Thank you for noting that. My feedback is replied inline.
>>>>>>>
>>>>>>>>
>>>>>>>> Jonathan
>>>>>>>>
>>>>>>>>>> ---
>>>>>>>>>>  drivers/perf/Kconfig        |   7 +
>>>>>>>>>>  drivers/perf/Makefile       |   1 +
>>>>>>>>>>  drivers/perf/dwc_pcie_pmu.c | 706 ++++++++++++++++++++++++++++++++++++
>>>>>>>>>>  3 files changed, 714 insertions(+)
>>>>>>>>>>  create mode 100644 drivers/perf/dwc_pcie_pmu.c
>>>>>>>>>>
>>>>>>>>>> diff --git a/drivers/perf/Kconfig b/drivers/perf/Kconfig
>>>>>>>>>> index 711f82400086..6ff3921d7a62 100644
>>>>>>>>>> --- a/drivers/perf/Kconfig
>>>>>>>>>> +++ b/drivers/perf/Kconfig
>>>>>>>>>> @@ -209,6 +209,13 @@ config MARVELL_CN10K_DDR_PMU
>>>>>>>>>>  	  Enable perf support for Marvell DDR Performance monitoring
>>>>>>>>>>  	  event on CN10K platform.
>>>>>>>>>>  
>>>>>>>>>> +config DWC_PCIE_PMU
>>>>>>>>>> +	tristate "Enable Synopsys DesignWare PCIe PMU Support"
>>>>>>>>>> +	depends on (ARM64 && PCI)
>>>>>>>>>> +	help
>>>>>>>>>> +	  Enable perf support for Synopsys DesignWare PCIe PMU Performance
>>>>>>>>>> +	  monitoring event on Yitian 710 platform.
>>>>>>>>
>>>>>>>> The documentation kind of implies this isn't platform specific.
>>>>>>>> If some parts are (such as which events exist) then you may want to push
>>>>>>>> that to userspace / perftool with appropriate matching against specific SoC.
>>>>>>>>
>>>>>>>> If it is generic, then change this text to "event on platform including the Yitian 710."
>>>>>>>
>>>>>>> It is generic without any platform specific, so I will change it as you expected.
>>>>>>>
>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>>  source "drivers/perf/arm_cspmu/Kconfig"
>>>>>>>>>>  
>>>>>>>>>>  source "drivers/perf/amlogic/Kconfig"
>>>>>>>>
>>>>>>>>>> new file mode 100644
>>>>>>>>>> index 000000000000..8bfcf6e0662d
>>>>>>>>>> --- /dev/null
>>>>>>>>>> +++ b/drivers/perf/dwc_pcie_pmu.c
>>>>>>>>>> @@ -0,0 +1,706 @@
>>>>>>>>
>>>>>>>> ...
>>>>>>>>
>>>>>>>>>> +
>>>>>>>>>> +struct dwc_pcie_pmu {
>>>>>>>>>> +	struct pci_dev		*pdev;		/* Root Port device */  
>>>>>>>>>
>>>>>>>>> If the root port removed after the probe of this PCIe PMU driver, we'll access the NULL
>>>>>>>>> pointer. I didn't see you hold the root port to avoid the removal.
>>>>>>>
>>>>>>> Do you mean that I should have a reference count of rootport by pci_dev_get() when allocating
>>>>>>> pcie_pmu?
>>>>>>>
>>>>>>>      pcie_pmu->pdev = pci_dev_get();
>>>>>>
>>>>>> It could be one option, but will block the removal of device from userspace. Another option
>>>>>> is to register a PCI bus notifier then on removal/added the driver can get notified and handle
>>>>>> it, for example, remove the related PMU on the removal of the root ports.
>>>>>
>>>>> I see, but can root port be removed from userspace? I check the hotplug slot interface, no root
>>>>> port is available to power off.
>>>>>
>>>>
>>>> For hotplug maybe not, but user can remove certian device through sysfs:
>>>>
>>>> echo 1 > /sys/bus/pci/devices/<root port>/remove
>>>>
>>>
>>> Thank you, I will add a notifier for removal/added action.
>>>
>>> Best Regards,
>>> Shuai
>>
>> Hi, Yicong,
>>
>> I am confused when adding a notifier by bus_register_notifier(). If I have added a action to
>> pdev->dev to unregister pmu:
>>
>> 		ret = perf_pmu_register(&pcie_pmu->pmu, name, -1);
>> 		if (ret) {
>> 			pci_err(pcie_pmu->pdev,
>> 				"Error %d registering PMU @%x\n", ret, bdf);
>> 			goto out;
>> 		}
>> 		ret = devm_add_action_or_reset(
>> 			&pdev->dev, dwc_pcie_pmu_unregister_pmu, &pcie_pmu->pmu);
>>
>> the pmu will be unregister when the port removes, so accessing the NULL pointer will never happen,
>> right? Do we still need the bus_register_notifier()?
>>
> 
> Not necessary, a notifier is used to notice the device removal and avoid dereferencing the
> NULL pointer. If you find another way like above to avoid the issue it will be ok. Since
> your pmu is 1:1 related to the root port, add a devm action to unregister the PMU on root
> port removal looks better.
> 
> Thanks.

Hi, Yicong,

Got it. Thank you for your quick feedback:)

Best Regards.
Shuai

^ permalink raw reply	[flat|nested] 31+ messages in thread

end of thread, other threads:[~2023-10-10  7:45 UTC | newest]

Thread overview: 31+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-06  7:49 [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
2023-06-06  7:49 ` [PATCH v6 1/4] docs: perf: Add description for Synopsys DesignWare PCIe PMU driver Shuai Xue
2023-07-27  8:57   ` Jonathan Cameron
2023-07-27 12:52     ` Shuai Xue
2023-07-28 10:18       ` Jonathan Cameron
2023-06-06  7:49 ` [PATCH v6 2/4] PCI: Add Alibaba Vendor ID to linux/pci_ids.h Shuai Xue
2023-06-06 15:31   ` Bjorn Helgaas
2023-06-07  0:42     ` Shuai Xue
2023-06-06  7:49 ` [PATCH v6 3/4] drivers/perf: add DesignWare PCIe PMU driver Shuai Xue
2023-06-06 15:14   ` Yicong Yang
2023-07-27  9:39     ` Jonathan Cameron
2023-07-28 12:41       ` Shuai Xue
2023-07-28 15:20         ` Jonathan Cameron
2023-08-01 11:46         ` Yicong Yang
2023-08-04  1:39           ` Shuai Xue
2023-08-04  2:28             ` Yicong Yang
2023-08-04  3:09               ` Shuai Xue
2023-10-09 13:08                 ` Shuai Xue
2023-10-10  7:35                   ` Yicong Yang
2023-10-10  7:45                     ` Shuai Xue
2023-07-28  1:31     ` Shuai Xue
2023-06-06  7:49 ` [PATCH v6 4/4] MAINTAINERS: add maintainers for " Shuai Xue
2023-06-16  8:39 ` [PATCH v6 0/4] drivers/perf: add Synopsys DesignWare PCIe PMU driver support Shuai Xue
2023-07-10 12:04   ` Shuai Xue
2023-07-24  2:34     ` Shuai Xue
2023-07-24  9:18       ` Jonathan Cameron
2023-07-24 12:13         ` Shuai Xue
2023-07-25 20:59         ` Bjorn Helgaas
2023-07-27  3:45           ` Shuai Xue
2023-07-28 13:39             ` Will Deacon
2023-07-31  7:30               ` Shuai Xue

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).