linux-kernel.vger.kernel.org archive mirror
 help / color / mirror / Atom feed
* [PATCH v4 0/8] Add PCIe bandwidth controller
@ 2024-01-05 11:25 Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 1/8] PCI: Protect Link Control 2 Register with RMW locking Ilpo Järvinen
                   ` (7 more replies)
  0 siblings, 8 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm
  Cc: linux-kernel, Alex Deucher, Daniel Lezcano, Amit Kucheria,
	Zhang Rui, Ilpo Järvinen

Hi all,

This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
cooling driver to the thermal core side for limiting PCIe Link Speed
due to thermal reasons. PCIe bandwidth controller is a PCI express bus
port service driver. A cooling device is created for each port the
service driver finds if they support changing speeds.

This series only adds support for controlling PCIe Link Speed.
Controlling PCIe Link Width might also be useful but AFAIK, there is no
mechanism for that until PCIe 6.0 (L0p) so Link Width throttling is not
added by this series.

bwctrl is built on top of BW notifications revert.

It tried to look into using cached link speed values more in code
unrelated to what bwctrl needs but every case I looked non-trivial so I
left such attempts as further work.

v4:
- Merge Port's and Endpoint's Supported Link Speeds Vectors into
  supported_speeds in the struct pci_bus
- Reuse pcie_get_speed_cap()'s code for pcie_get_supported_speeds()
- Setup supported_speeds with PCI_EXP_LNKCAP2_SLS_2_5GB when no
  Endpoint exists
- Squash revert + add bwctrl patches into one
- Change to use threaded IRQ + IRQF_ONESHOT
- Enable also LABIE / LABS
- Convert Link Speed selection to use bit logic instead of loop
- Allocate before requesting IRQ during probe
- Use devm_*()
- Use u8 for speed_conv array instead of u16
- Removed READ_ONCE()
- Improve changelogs, comments, and Kconfig
- Name functions slightly more consistently
- Use bullet list for RMW protected registers in docs

v3:
- Correct hfi1 shortlog prefix
- Improve error prints in hfi1
- Add L: linux-pci to the MAINTAINERS entry

v2:
- Adds LNKCTL2 to RMW safe list in Documentation/PCI/pciebus-howto.rst
- Renamed cooling devices from PCIe_Port_* to PCIe_Port_Link_Speed_* in
  order to plan for possibility of adding Link Width cooling devices
  later on
- Moved struct thermal_cooling_device declaration to the correct patch
- Small tweaks to Kconfig texts
- Series rebased to resolve conflict (in the selftest list)


Ilpo Järvinen (8):
  PCI: Protect Link Control 2 Register with RMW locking
  drm/radeon: Use RMW accessors for changing LNKCTL2
  drm/amdgpu: Use RMW accessors for changing LNKCTL2
  RDMA/hfi1: Use RMW accessors for changing LNKCTL2
  PCI: Store all PCIe Supported Link Speeds
  PCI/link: Re-add BW notification portdrv as PCIe BW controller
  thermal: Add PCIe cooling driver
  selftests/pcie_bwctrl: Create selftests

 Documentation/PCI/pciebus-howto.rst           |  14 +-
 MAINTAINERS                                   |   9 +
 drivers/gpu/drm/amd/amdgpu/cik.c              |  41 +--
 drivers/gpu/drm/amd/amdgpu/si.c               |  41 +--
 drivers/gpu/drm/radeon/cik.c                  |  40 +--
 drivers/gpu/drm/radeon/si.c                   |  40 +--
 drivers/infiniband/hw/hfi1/pcie.c             |  30 +-
 drivers/pci/pci.c                             |  59 ++--
 drivers/pci/pcie/Kconfig                      |  12 +
 drivers/pci/pcie/Makefile                     |   1 +
 drivers/pci/pcie/bwctrl.c                     | 269 ++++++++++++++++++
 drivers/pci/pcie/portdrv.c                    |   9 +-
 drivers/pci/pcie/portdrv.h                    |  10 +-
 drivers/pci/probe.c                           |   8 +
 drivers/pci/remove.c                          |   3 +
 drivers/thermal/Kconfig                       |  10 +
 drivers/thermal/Makefile                      |   2 +
 drivers/thermal/pcie_cooling.c                | 107 +++++++
 include/linux/pci-bwctrl.h                    |  33 +++
 include/linux/pci.h                           |  11 +
 include/uapi/linux/pci_regs.h                 |   1 +
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
 .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 ++++++++
 .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 +++++
 25 files changed, 789 insertions(+), 153 deletions(-)
 create mode 100644 drivers/pci/pcie/bwctrl.c
 create mode 100644 drivers/thermal/pcie_cooling.c
 create mode 100644 include/linux/pci-bwctrl.h
 create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh

-- 
2.39.2


^ permalink raw reply	[flat|nested] 12+ messages in thread

* [PATCH v4 1/8] PCI: Protect Link Control 2 Register with RMW locking
  2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
@ 2024-01-05 11:25 ` Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 2/8] drm/radeon: Use RMW accessors for changing LNKCTL2 Ilpo Järvinen
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm, Jonathan Corbet, linux-doc,
	linux-kernel
  Cc: Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui,
	Ilpo Järvinen

PCIe Bandwidth Controller performs RMW accesses the Link Control 2
Register which can occur concurrently to other sources of Link Control
2 Register writes. Therefore, add Link Control 2 Register among the PCI
Express Capability Registers that need RMW locking.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Lukas Wunner <lukas@wunner.de>
---
 Documentation/PCI/pciebus-howto.rst | 14 +++++++++-----
 include/linux/pci.h                 |  1 +
 2 files changed, 10 insertions(+), 5 deletions(-)

diff --git a/Documentation/PCI/pciebus-howto.rst b/Documentation/PCI/pciebus-howto.rst
index a0027e8fb0d0..cd7857dd37aa 100644
--- a/Documentation/PCI/pciebus-howto.rst
+++ b/Documentation/PCI/pciebus-howto.rst
@@ -217,8 +217,12 @@ capability structure except the PCI Express capability structure,
 that is shared between many drivers including the service drivers.
 RMW Capability accessors (pcie_capability_clear_and_set_word(),
 pcie_capability_set_word(), and pcie_capability_clear_word()) protect
-a selected set of PCI Express Capability Registers (Link Control
-Register and Root Control Register). Any change to those registers
-should be performed using RMW accessors to avoid problems due to
-concurrent updates. For the up-to-date list of protected registers,
-see pcie_capability_clear_and_set_word().
+a selected set of PCI Express Capability Registers:
+
+* Link Control Register
+* Root Control Register
+* Link Control 2 Register
+
+Any change to those registers should be performed using RMW accessors to
+avoid problems due to concurrent updates. For the up-to-date list of
+protected registers, see pcie_capability_clear_and_set_word().
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 60ca768bc867..345a3d2a3fcd 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1269,6 +1269,7 @@ static inline int pcie_capability_clear_and_set_word(struct pci_dev *dev,
 {
 	switch (pos) {
 	case PCI_EXP_LNKCTL:
+	case PCI_EXP_LNKCTL2:
 	case PCI_EXP_RTCTL:
 		return pcie_capability_clear_and_set_word_locked(dev, pos,
 								 clear, set);
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 2/8] drm/radeon: Use RMW accessors for changing LNKCTL2
  2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 1/8] PCI: Protect Link Control 2 Register with RMW locking Ilpo Järvinen
@ 2024-01-05 11:25 ` Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 3/8] drm/amdgpu: " Ilpo Järvinen
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm, Alex Deucher, Christian König,
	Pan, Xinhui, David Airlie, Daniel Vetter, amd-gfx, dri-devel,
	linux-kernel
  Cc: Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui,
	Ilpo Järvinen

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/gpu/drm/radeon/cik.c | 40 ++++++++++++++----------------------
 drivers/gpu/drm/radeon/si.c  | 40 ++++++++++++++----------------------
 2 files changed, 30 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index 10be30366c2b..b5e96a8fc2c1 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -9592,28 +9592,18 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(rdev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(rdev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 |
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 |
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -9627,15 +9617,15 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
+	tmp16 = 0;
 	if (speed_cap == PCIE_SPEED_8_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (speed_cap == PCIE_SPEED_5_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index a91012447b56..32871ca09a0f 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -7189,28 +7189,18 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(rdev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(rdev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -7224,15 +7214,15 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
+	tmp16 = 0;
 	if (speed_cap == PCIE_SPEED_8_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (speed_cap == PCIE_SPEED_5_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 3/8] drm/amdgpu: Use RMW accessors for changing LNKCTL2
  2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 1/8] PCI: Protect Link Control 2 Register with RMW locking Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 2/8] drm/radeon: Use RMW accessors for changing LNKCTL2 Ilpo Järvinen
@ 2024-01-05 11:25 ` Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 4/8] RDMA/hfi1: " Ilpo Järvinen
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm, Alex Deucher, Christian König,
	Pan, Xinhui, David Airlie, Daniel Vetter, amd-gfx, dri-devel,
	linux-kernel
  Cc: Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui,
	Ilpo Järvinen

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/cik.c | 41 ++++++++++++--------------------
 drivers/gpu/drm/amd/amdgpu/si.c  | 41 ++++++++++++--------------------
 2 files changed, 30 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index 4dfaa017cf7f..a3a643254d7a 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -1638,28 +1638,18 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(adev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(adev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE(ixPCIE_LC_CNTL4);
 				tmp &= ~PCIE_LC_CNTL4__LC_SET_QUIESCE_MASK;
@@ -1674,16 +1664,15 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
 	speed_cntl &= ~PCIE_LC_SPEED_CNTL__LC_FORCE_DIS_SW_SPEED_CHANGE_MASK;
 	WREG32_PCIE(ixPCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
-
+	tmp16 = 0;
 	if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE(ixPCIE_LC_SPEED_CNTL);
 	speed_cntl |= PCIE_LC_SPEED_CNTL__LC_INITIATE_LINK_SPEED_CHANGE_MASK;
diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index a757526153e5..23e4ef4fff7c 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -2331,28 +2331,18 @@ static void si_pcie_gen3_enable(struct amdgpu_device *adev)
 								   gpu_cfg &
 								   PCI_EXP_LNKCTL_HAWD);
 
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(adev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(adev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -2365,16 +2355,15 @@ static void si_pcie_gen3_enable(struct amdgpu_device *adev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
-
+	tmp16 = 0;
 	if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 4/8] RDMA/hfi1: Use RMW accessors for changing LNKCTL2
  2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
                   ` (2 preceding siblings ...)
  2024-01-05 11:25 ` [PATCH v4 3/8] drm/amdgpu: " Ilpo Järvinen
@ 2024-01-05 11:25 ` Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 5/8] PCI: Store all PCIe Supported Link Speeds Ilpo Järvinen
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm, Dennis Dalessandro, Jason Gunthorpe,
	Leon Romanovsky, linux-rdma, linux-kernel
  Cc: Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui,
	Ilpo Järvinen, Dean Luick

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Reviewed-by: Dean Luick <dean.luick@cornelisnetworks.com>
---
 drivers/infiniband/hw/hfi1/pcie.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
index 119ec2f1382b..7133964749f8 100644
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -1207,14 +1207,11 @@ int do_pcie_gen3_transition(struct hfi1_devdata *dd)
 		    (u32)lnkctl2);
 	/* only write to parent if target is not as high as ours */
 	if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) < target_vector) {
-		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
-		lnkctl2 |= target_vector;
-		dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
-			    (u32)lnkctl2);
-		ret = pcie_capability_write_word(parent,
-						 PCI_EXP_LNKCTL2, lnkctl2);
+		ret = pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
+							 PCI_EXP_LNKCTL2_TLS,
+							 target_vector);
 		if (ret) {
-			dd_dev_err(dd, "Unable to write to PCI config\n");
+			dd_dev_err(dd, "Unable to change parent PCI target speed\n");
 			return_error = 1;
 			goto done;
 		}
@@ -1223,22 +1220,11 @@ int do_pcie_gen3_transition(struct hfi1_devdata *dd)
 	}
 
 	dd_dev_info(dd, "%s: setting target link speed\n", __func__);
-	ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL2, &lnkctl2);
+	ret = pcie_capability_clear_and_set_word(dd->pcidev, PCI_EXP_LNKCTL2,
+						 PCI_EXP_LNKCTL2_TLS,
+						 target_vector);
 	if (ret) {
-		dd_dev_err(dd, "Unable to read from PCI config\n");
-		return_error = 1;
-		goto done;
-	}
-
-	dd_dev_info(dd, "%s: ..old link control2: 0x%x\n", __func__,
-		    (u32)lnkctl2);
-	lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
-	lnkctl2 |= target_vector;
-	dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
-		    (u32)lnkctl2);
-	ret = pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL2, lnkctl2);
-	if (ret) {
-		dd_dev_err(dd, "Unable to write to PCI config\n");
+		dd_dev_err(dd, "Unable to change device PCI target speed\n");
 		return_error = 1;
 		goto done;
 	}
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 5/8] PCI: Store all PCIe Supported Link Speeds
  2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
                   ` (3 preceding siblings ...)
  2024-01-05 11:25 ` [PATCH v4 4/8] RDMA/hfi1: " Ilpo Järvinen
@ 2024-01-05 11:25 ` Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 6/8] PCI/link: Re-add BW notification portdrv as PCIe BW controller Ilpo Järvinen
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm, linux-kernel
  Cc: Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui,
	Ilpo Järvinen

PCIe bandwidth controller added by a subsequent commit will require
selecting PCIe Link Speeds that are lower than the Maximum Link Speed.

The struct pci_bus only stores max_bus_speed. Even if PCIe r6.1 sec
8.2.1 currently disallows gaps in supported Link Speeds, the
Implementation Note in PCIe r6.1 sec 7.5.3.18, recommends determining
supported Link Speeds using the Supported Link Speeds Vector in the
Link Capabilities 2 Register (when available) to "avoid software being
confused if a future specification defines Links that do not require
support for all slower speeds."

Reuse code in pcie_get_speed_cap() to add pcie_get_supported_speeds()
to query the Supported Link Speeds Vector of a PCIe device. The value
is taken directly from the Supported Link Speeds Vector or synthetized
from the Max Link Speed in the Link Capabilities Register when the Link
Capabilities 2 Register is not available.

The Supported Link Speeds Vector in the Link Capabilities Register 2
corresponds to the bus below on Root Ports and Downstream Ports,
whereas it corresponds to the bus above on Upstream Ports and
Endpoints (PCIe r6.1 sec 7.5.3.18):

	"Supported Link Speeds Vector - This field indicates the
	supported Link speed(s) of the associated Port."

Add supported_speeds into the struct pci_bus that caches the
intersection of the upstream and downstream Supported Link Speeds
Vectors. When the Function 0 is enumerated, calculate the intersection
and set supported_speeds (as per PCIe r6.1 sec 7.5.3.18, the
Multi-Function Devices must have the same speeds for all Functions). If
no Upstream Port or Endpoint exists, supported_speeds is set to
2.5GT/s.

supported_speeds contains a set of Link Speeds only in the case where
PCIe Link Speed can be determined. The Root Complex Integrated
Endpoints do not have a well-defined Link Speed because they do not
seem to implement either of the Link Capabilities Registers, which is
allowed by PCIe r6.1 sec 7.5.3 (the same limitation applies to
determining cur_bus_speed and max_bus_speed that are PCI_SPEED_UNKNOWN
in such case). This is of no concern from PCIe bandwidth controller
point of view because such devices are not attached into a PCIe Root
Port that could be controlled.

supported_speeds field keeps the extra reserved zero at the least
significant bit to match the Link Capabilities 2 Register layouting.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pci.c             | 59 ++++++++++++++++++++++++-----------
 drivers/pci/probe.c           |  8 +++++
 drivers/pci/remove.c          |  3 ++
 include/linux/pci.h           | 10 ++++++
 include/uapi/linux/pci_regs.h |  1 +
 5 files changed, 63 insertions(+), 18 deletions(-)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 55bc3576a985..9c037e90df8a 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -6283,38 +6283,61 @@ u32 pcie_bandwidth_available(struct pci_dev *dev, struct pci_dev **limiting_dev,
 EXPORT_SYMBOL(pcie_bandwidth_available);
 
 /**
- * pcie_get_speed_cap - query for the PCI device's link speed capability
+ * pcie_get_supported_speeds - query Supported Link Speed Vector
  * @dev: PCI device to query
  *
- * Query the PCI device speed capability.  Return the maximum link speed
- * supported by the device.
+ * Query @dev supported link speeds.
+ *
+ * Implementation Note in PCIe r6.0.1 sec 7.5.3.18 recommends determining
+ * supported link speeds using the Supported Link Speeds Vector in the Link
+ * Capabilities 2 Register (when available).
+ *
+ * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18.
+ *
+ * Without Link Capabilities 2, i.e., prior to PCIe r3.0, Supported Link
+ * Speeds field in Link Capabilities is used and only 2.5 GT/s and 5.0 GT/s
+ * speeds were defined.
+ *
+ * For @dev without Supported Link Speed Vector, the field is synthetized
+ * from the Max Link Speed field in the Link Capabilities Register.
+ *
+ * Return: Supported Link Speeds Vector
  */
-enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
+u8 pcie_get_supported_speeds(struct pci_dev *dev)
 {
 	u32 lnkcap2, lnkcap;
+	u8 speeds;
 
-	/*
-	 * Link Capabilities 2 was added in PCIe r3.0, sec 7.8.18.  The
-	 * implementation note there recommends using the Supported Link
-	 * Speeds Vector in Link Capabilities 2 when supported.
-	 *
-	 * Without Link Capabilities 2, i.e., prior to PCIe r3.0, software
-	 * should use the Supported Link Speeds field in Link Capabilities,
-	 * where only 2.5 GT/s and 5.0 GT/s speeds were defined.
-	 */
 	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &lnkcap2);
+	speeds = lnkcap2 & PCI_EXP_LNKCAP2_SLS;
 
 	/* PCIe r3.0-compliant */
-	if (lnkcap2)
-		return PCIE_LNKCAP2_SLS2SPEED(lnkcap2);
+	if (speeds)
+		return speeds;
 
 	pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnkcap);
+
+	/* Synthetize from the Max Link Speed field */
 	if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
-		return PCIE_SPEED_5_0GT;
+		speeds = PCI_EXP_LNKCAP2_SLS_5_0GB | PCI_EXP_LNKCAP2_SLS_2_5GB;
 	else if ((lnkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_2_5GB)
-		return PCIE_SPEED_2_5GT;
+		speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
+
+	return speeds;
+}
+EXPORT_SYMBOL_GPL(pcie_get_supported_speeds);
 
-	return PCI_SPEED_UNKNOWN;
+/**
+ * pcie_get_speed_cap - query for the PCI device's link speed capability
+ * @dev: PCI device to query
+ *
+ * Query the PCI device speed capability.
+ *
+ * Return: the maximum link speed supported by the device.
+ */
+enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev)
+{
+	return PCIE_LNKCAP2_SLS2SPEED(pcie_get_supported_speeds(dev));
 }
 EXPORT_SYMBOL(pcie_get_speed_cap);
 
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index ed6b7f48736a..98700b55e5fd 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -819,6 +819,8 @@ static void pci_set_bus_speed(struct pci_bus *bus)
 
 		pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
 		bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
+		if (bus->max_bus_speed != PCI_SPEED_UNKNOWN)
+			bus->supported_speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
 
 		pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
 		pcie_update_link_speed(bus, linksta);
@@ -2538,6 +2540,7 @@ static void pci_set_msi_domain(struct pci_dev *dev)
 
 void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 {
+	u8 speeds = 0;
 	int ret;
 
 	pci_configure_device(dev);
@@ -2564,11 +2567,16 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 
 	pci_init_capabilities(dev);
 
+	if (bus->self && pci_is_pcie(dev) && PCI_FUNC(dev->devfn) == 0) {
+		speeds = pcie_get_supported_speeds(bus->self) &
+			 pcie_get_supported_speeds(dev);
+	}
 	/*
 	 * Add the device to our list of discovered devices
 	 * and the bus list for fixup functions, etc.
 	 */
 	down_write(&pci_bus_sem);
+	bus->supported_speeds = speeds;
 	list_add_tail(&dev->bus_list, &bus->devices);
 	up_write(&pci_bus_sem);
 
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index d749ea8250d6..c492527e994a 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -36,6 +36,9 @@ static void pci_destroy_dev(struct pci_dev *dev)
 	device_del(&dev->dev);
 
 	down_write(&pci_bus_sem);
+	if (pci_is_pcie(dev) && PCI_FUNC(dev->devfn) == 0 &&
+	    dev->bus->max_bus_speed != PCI_SPEED_UNKNOWN)
+		dev->bus->supported_speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 345a3d2a3fcd..1e368d7563c5 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -303,6 +303,7 @@ enum pci_bus_speed {
 	PCI_SPEED_UNKNOWN		= 0xff,
 };
 
+u8 pcie_get_supported_speeds(struct pci_dev *dev);
 enum pci_bus_speed pcie_get_speed_cap(struct pci_dev *dev);
 enum pcie_link_width pcie_get_width_cap(struct pci_dev *dev);
 
@@ -644,6 +645,14 @@ struct pci_bus_resource {
 
 #define PCI_REGION_FLAG_MASK	0x0fU	/* These bits of resource flags tell us the PCI region flags */
 
+/*
+ * @supported_speeds:	PCIe Supported Link Speeds Vector (+ reserved 0 at
+ *			LSB). Combination of downstream and upstream
+ *			Supported Link Speeds Vectors. 0 when speed cannot
+ *			be determined (e.g., for Root Complex Integrated
+ *			Endpoints without the relevant Capability
+ *			Registers).
+ */
 struct pci_bus {
 	struct list_head node;		/* Node in list of buses */
 	struct pci_bus	*parent;	/* Parent bus this bridge is on */
@@ -664,6 +673,7 @@ struct pci_bus {
 	unsigned char	primary;	/* Number of primary bridge */
 	unsigned char	max_bus_speed;	/* enum pci_bus_speed */
 	unsigned char	cur_bus_speed;	/* enum pci_bus_speed */
+	u8		supported_speeds;	/* Supported Link Speeds Vector */
 #ifdef CONFIG_PCI_DOMAINS_GENERIC
 	int		domain_nr;
 #endif
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index a39193213ff2..7f929e04222b 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -676,6 +676,7 @@
 #define PCI_EXP_DEVSTA2		0x2a	/* Device Status 2 */
 #define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V2 0x2c	/* end of v2 EPs w/o link */
 #define PCI_EXP_LNKCAP2		0x2c	/* Link Capabilities 2 */
+#define  PCI_EXP_LNKCAP2_SLS		0x000000fe /* Supported Link Speeds Vector */
 #define  PCI_EXP_LNKCAP2_SLS_2_5GB	0x00000002 /* Supported Speed 2.5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_5_0GB	0x00000004 /* Supported Speed 5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_8_0GB	0x00000008 /* Supported Speed 8GT/s */
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 6/8] PCI/link: Re-add BW notification portdrv as PCIe BW controller
  2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
                   ` (4 preceding siblings ...)
  2024-01-05 11:25 ` [PATCH v4 5/8] PCI: Store all PCIe Supported Link Speeds Ilpo Järvinen
@ 2024-01-05 11:25 ` Ilpo Järvinen
  2024-01-09  9:14   ` Krishna Chaitanya Chundru
  2024-01-05 11:25 ` [PATCH v4 7/8] thermal: Add PCIe cooling driver Ilpo Järvinen
  2024-01-05 11:25 ` [PATCH v4 8/8] selftests/pcie_bwctrl: Create selftests Ilpo Järvinen
  7 siblings, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm, Ilpo Järvinen, linux-kernel
  Cc: Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui

This mostly reverts b4c7d2076b4e ("PCI/LINK: Remove bandwidth
notification") and builds PCIe bandwidth controller on top of it.

The PCIe bandwidth notification were first added in the commit
e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
notification") but later had to be removed. The significant changes
compared with the old bandwidth notification driver include:

1) Don't print the notifications into kernel log, just keep the Link
   Speed cached into the struct pci_bus updated. While somewhat
   unfortunate, the log spam was the source of complaints that
   eventually lead to the removal of the bandwidth notifications driver
   (see the links below for further information).

2) Besides the Link Bandwidth Management Interrupt, enable also Link
   Autonomous Bandwidth Interrupt to cover the other source of
   bandwidth changes.

3) Use threaded IRQ with IRQF_ONESHOT to handle Bandwidth Notification
   Interrupts to address the problem fixed in the commit 3e82a7f9031f
   ("PCI/LINK: Supply IRQ handler so level-triggered IRQs are acked")).

4) Handle Link Speed updates robustly. Refresh the cached Link Speed
   when enabling Bandwidth Notification Interrupts, and solve the race
   between Link Speed read and LBMS/LABS update in
   pcie_bandwidth_notification_irq_thread().

5) Use concurrency safe LNKCTL RMW operations.

6) The driver is now called PCIe bwctrl (bandwidth controller) instead
   of just bandwidth notifications because of increased scope and
   functionality within the driver.

PCIe bandwidth controller introduces an in-kernel API to set PCIe Link
Speed. This new API is intended to be used in an upcoming commit that
adds a thermal cooling device to throttle PCIe bandwidth when thermal
thresholds are reached. No users are introduced in this commit yet.

The PCIe bandwidth control procedure is as follows. The highest speed
supported by the Port and the PCIe device which is not higher than the
requested speed is selected and written into the Target Link Speed in
the Link Control 2 Register. Then bandwidth controller retrains the
PCIe Link.

Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus
to keep track PCIe Link Speed changes. While Bandwidth Notifications
should also be generated when bandwidth controller alters the PCIe Link
Speed, a few platforms do not deliver LMBS interrupt after Link
Training as expected. Thus, after changing the Link Speed, bandwidth
controller makes additional read for the Link Status Register to ensure
cur_bus_speed is consistent with the new PCIe Link Speed.

Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/
Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/
Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 MAINTAINERS                |   7 +
 drivers/pci/pcie/Kconfig   |  12 ++
 drivers/pci/pcie/Makefile  |   1 +
 drivers/pci/pcie/bwctrl.c  | 263 +++++++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.c |   9 +-
 drivers/pci/pcie/portdrv.h |  10 +-
 include/linux/pci-bwctrl.h |  17 +++
 7 files changed, 313 insertions(+), 6 deletions(-)
 create mode 100644 drivers/pci/pcie/bwctrl.c
 create mode 100644 include/linux/pci-bwctrl.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 97f51d5ec1cf..ccc5d50bf340 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16766,6 +16766,13 @@ F:	include/linux/pci*
 F:	include/uapi/linux/pci*
 F:	lib/pci*
 
+PCIE BANDWIDTH CONTROLLER
+M:	Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
+L:	linux-pci@vger.kernel.org
+S:	Supported
+F:	drivers/pci/pcie/bwctrl.c
+F:	include/linux/pci-bwctrl.h
+
 PCIE DRIVER FOR AMAZON ANNAPURNA LABS
 M:	Jonathan Chocron <jonnyc@amazon.com>
 L:	linux-pci@vger.kernel.org
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 8999fcebde6a..f1ef6030d2c2 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -146,6 +146,18 @@ config PCIE_PTM
 	  This is only useful if you have devices that support PTM, but it
 	  is safe to enable even if you don't.
 
+config PCIE_BW
+	bool "PCI Express Bandwidth Controller"
+	depends on PCIEPORTBUS
+	help
+	  This enables PCI Express Bandwidth Controller. The Bandwidth
+	  Controller allows controlling PCIe Link Speed and listens for Link
+	  Bandwidth Change Notifications. The current Link Speed is available
+	  through /sys/bus/pci/devices/.../current_link_speed.
+
+	  If you know Link Width or Speed changes occur (e.g., to correct
+	  unreliable links), you may answer Y.
+
 config PCIE_EDR
 	bool "PCI Express Error Disconnect Recover support"
 	depends on PCIE_DPC && ACPI
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 8de4ed5f98f1..175065a495cf 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -12,4 +12,5 @@ obj-$(CONFIG_PCIEAER_INJECT)	+= aer_inject.o
 obj-$(CONFIG_PCIE_PME)		+= pme.o
 obj-$(CONFIG_PCIE_DPC)		+= dpc.o
 obj-$(CONFIG_PCIE_PTM)		+= ptm.o
+obj-$(CONFIG_PCIE_BW)		+= bwctrl.o
 obj-$(CONFIG_PCIE_EDR)		+= edr.o
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
new file mode 100644
index 000000000000..e17c98c25f06
--- /dev/null
+++ b/drivers/pci/pcie/bwctrl.c
@@ -0,0 +1,263 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * PCIe bandwidth controller
+ *
+ * Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
+ *
+ * Copyright (C) 2019 Dell Inc
+ * Copyright (C) 2023 Intel Corporation
+ *
+ * The PCIe bandwidth controller provides a way to alter PCIe Link Speeds
+ * and notify the operating system when the Link Width or Speed changes. The
+ * notification capability is required for all Root Ports and Downstream
+ * Ports supporting Link Width wider than x1 and/or multiple Link Speeds.
+ *
+ * This service port driver hooks into the Bandwidth Notification interrupt
+ * watching for changes or links becoming degraded in operation. It updates
+ * the cached Current Link Speed that is exposed to user space through sysfs.
+ */
+
+#define dev_fmt(fmt) "bwctrl: " fmt
+
+#include <linux/bitops.h>
+#include <linux/bits.h>
+#include <linux/device.h>
+#include <linux/errno.h>
+#include <linux/interrupt.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pci-bwctrl.h>
+#include <linux/types.h>
+
+#include "../pci.h"
+#include "portdrv.h"
+
+/**
+ * struct pcie_bwctrl_service_data - PCIe bandwidth controller
+ * @set_speed_mutex: serializes link speed changes
+ */
+struct pcie_bwctrl_service_data {
+	struct mutex set_speed_mutex;
+};
+
+static bool pcie_valid_speed(enum pci_bus_speed speed)
+{
+	return (speed >= PCIE_SPEED_2_5GT) && (speed <= PCIE_SPEED_64_0GT);
+}
+
+static u16 pci_bus_speed2lnkctl2(enum pci_bus_speed speed)
+{
+	static const u8 speed_conv[] = {
+		[PCIE_SPEED_2_5GT] = PCI_EXP_LNKCTL2_TLS_2_5GT,
+		[PCIE_SPEED_5_0GT] = PCI_EXP_LNKCTL2_TLS_5_0GT,
+		[PCIE_SPEED_8_0GT] = PCI_EXP_LNKCTL2_TLS_8_0GT,
+		[PCIE_SPEED_16_0GT] = PCI_EXP_LNKCTL2_TLS_16_0GT,
+		[PCIE_SPEED_32_0GT] = PCI_EXP_LNKCTL2_TLS_32_0GT,
+		[PCIE_SPEED_64_0GT] = PCI_EXP_LNKCTL2_TLS_64_0GT,
+	};
+
+	if (WARN_ON_ONCE(!pcie_valid_speed(speed)))
+		return 0;
+
+	return speed_conv[speed];
+}
+
+static inline u16 pcie_supported_speeds2target_speed(u8 supported_speeds)
+{
+	return __fls(supported_speeds);
+}
+
+static void pcie_bwnotif_enable(struct pci_dev *dev)
+{
+	u16 link_status;
+	int ret;
+
+	pcie_capability_set_word(dev, PCI_EXP_LNKCTL,
+				 PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
+	pcie_capability_write_word(dev, PCI_EXP_LNKSTA,
+				   PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_LABS);
+
+	/* Read after enabling notifications to ensure link speed is up to date */
+	ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &link_status);
+	if (ret == PCIBIOS_SUCCESSFUL)
+		pcie_update_link_speed(dev->subordinate, link_status);
+}
+
+static void pcie_bwnotif_disable(struct pci_dev *dev)
+{
+	pcie_capability_clear_word(dev, PCI_EXP_LNKCTL,
+				   PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
+}
+
+static irqreturn_t pcie_bwnotif_irq_thread(int irq, void *context)
+{
+	struct pcie_device *srv = context;
+	struct pci_dev *port = srv->port;
+	u16 link_status, events;
+	int ret;
+
+	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
+	events = link_status & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_LABS);
+
+	if (ret != PCIBIOS_SUCCESSFUL || !events)
+		return IRQ_NONE;
+
+	pcie_capability_write_word(port, PCI_EXP_LNKSTA, events);
+
+	/*
+	 * The write to clear LBMS prevents getting interrupt from the
+	 * latest link speed when the link speed changes between the above
+	 * LNKSTA read and write. Therefore, re-read the speed before
+	 * updating it.
+	 */
+	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
+	if (ret != PCIBIOS_SUCCESSFUL)
+		return IRQ_HANDLED;
+	pcie_update_link_speed(port->subordinate, link_status);
+
+	return IRQ_HANDLED;
+}
+
+/**
+ * pcie_bwctrl_select_speed - Select Target Link Speed
+ * @srv:	PCIe Port
+ * @speed_req:	requested PCIe Link Speed
+ *
+ * Select Target Link Speed by take into account Supported Link Speeds of
+ * both the Root Port and the Endpoint.
+ *
+ * Return: Target Link Speed (1=2.5GT/s, 2=5GT/s, 3=8GT/s, etc.)
+ */
+static u16 pcie_bwctrl_select_speed(struct pcie_device *srv, enum pci_bus_speed speed_req)
+{
+	struct pci_bus *bus = srv->port->subordinate;
+	u8 desired_speeds;
+
+	if (WARN_ON_ONCE(!bus->supported_speeds))
+		return PCI_EXP_LNKCAP2_SLS_2_5GB;
+
+	desired_speeds = GENMASK(pci_bus_speed2lnkctl2(speed_req),
+				 __fls(PCI_EXP_LNKCAP2_SLS_2_5GB));
+
+	return pcie_supported_speeds2target_speed(bus->supported_speeds & desired_speeds);
+}
+
+/**
+ * pcie_bwctrl_set_current_speed - Set downstream Link Speed for PCIe Port
+ * @srv:	PCIe Port
+ * @speed_req:	requested PCIe Link Speed
+ *
+ * Attempts to set PCIe Port Link Speed to @speed_req. @speed_req may be
+ * adjusted downwards to the best speed supported by both the Port and PCIe
+ * Device underneath it.
+ *
+ * Return:
+ * * 0 		- on success
+ * * -EINVAL 	- @speed_req is not a PCIe Link Speed
+ * * -ETIMEDOUT	- changing Link Speed took too long
+ * * -EAGAIN	- Link Speed was changed but @speed_req was not achieved
+ */
+int pcie_bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed_req)
+{
+	struct pcie_bwctrl_service_data *data = get_service_data(srv);
+	struct pci_dev *port = srv->port;
+	u16 target_speed, link_status;
+	int ret;
+
+	if (WARN_ON_ONCE(!pcie_valid_speed(speed_req)))
+		return -EINVAL;
+
+	if (port->subordinate->cur_bus_speed == speed_req)
+		return 0;
+
+	target_speed = pcie_bwctrl_select_speed(srv, speed_req);
+
+	mutex_lock(&data->set_speed_mutex);
+	ret = pcie_capability_clear_and_set_word(port, PCI_EXP_LNKCTL2,
+						 PCI_EXP_LNKCTL2_TLS, target_speed);
+	if (ret != PCIBIOS_SUCCESSFUL) {
+		ret = pcibios_err_to_errno(ret);
+		goto unlock;
+	}
+
+	ret = pcie_retrain_link(port, true);
+	if (ret < 0)
+		goto unlock;
+
+	/*
+	 * Ensure link speed updates also with platforms that have problems
+	 * with notifications
+	 */
+	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
+	if (ret == PCIBIOS_SUCCESSFUL)
+		pcie_update_link_speed(port->subordinate, link_status);
+
+	if (port->subordinate->cur_bus_speed != speed_req)
+		ret = -EAGAIN;
+
+unlock:
+	mutex_unlock(&data->set_speed_mutex);
+
+	return ret;
+}
+
+static int pcie_bwnotif_probe(struct pcie_device *srv)
+{
+	struct pcie_bwctrl_service_data *data;
+	struct pci_dev *port = srv->port;
+	int ret;
+
+	data = devm_kzalloc(&srv->device, sizeof(*data), GFP_KERNEL);
+	if (!data)
+		return -ENOMEM;
+
+	mutex_init(&data->set_speed_mutex);
+	set_service_data(srv, data);
+
+	ret = devm_request_threaded_irq(&srv->device, srv->irq, NULL,
+					pcie_bwnotif_irq_thread,
+					IRQF_SHARED | IRQF_ONESHOT,
+					"PCIe BW ctrl", srv);
+	if (ret)
+		return ret;
+
+	pcie_bwnotif_enable(port);
+	pci_info(port, "enabled with IRQ %d\n", srv->irq);
+
+	return 0;
+}
+
+static void pcie_bwnotif_remove(struct pcie_device *srv)
+{
+	struct pcie_bwctrl_service_data *data = get_service_data(srv);
+
+	pcie_bwnotif_disable(srv->port);
+	mutex_destroy(&data->set_speed_mutex);
+}
+
+static int pcie_bwnotif_suspend(struct pcie_device *srv)
+{
+	pcie_bwnotif_disable(srv->port);
+	return 0;
+}
+
+static int pcie_bwnotif_resume(struct pcie_device *srv)
+{
+	pcie_bwnotif_enable(srv->port);
+	return 0;
+}
+
+static struct pcie_port_service_driver pcie_bwctrl_driver = {
+	.name		= "pcie_bwctrl",
+	.port_type	= PCIE_ANY_PORT,
+	.service	= PCIE_PORT_SERVICE_BWCTRL,
+	.probe		= pcie_bwnotif_probe,
+	.suspend	= pcie_bwnotif_suspend,
+	.resume		= pcie_bwnotif_resume,
+	.remove		= pcie_bwnotif_remove,
+};
+
+int __init pcie_bwctrl_init(void)
+{
+	return pcie_port_service_register(&pcie_bwctrl_driver);
+}
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 14a4b89a3b83..e8a348949d70 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -68,7 +68,7 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 	 */
 
 	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
-		    PCIE_PORT_SERVICE_BWNOTIF)) {
+		    PCIE_PORT_SERVICE_BWCTRL)) {
 		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
 		*pme = FIELD_GET(PCI_EXP_FLAGS_IRQ, reg16);
 		nvec = *pme + 1;
@@ -150,11 +150,11 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
 
 	/* PME, hotplug and bandwidth notification share an MSI/MSI-X vector */
 	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
-		    PCIE_PORT_SERVICE_BWNOTIF)) {
+		    PCIE_PORT_SERVICE_BWCTRL)) {
 		pcie_irq = pci_irq_vector(dev, pme);
 		irqs[PCIE_PORT_SERVICE_PME_SHIFT] = pcie_irq;
 		irqs[PCIE_PORT_SERVICE_HP_SHIFT] = pcie_irq;
-		irqs[PCIE_PORT_SERVICE_BWNOTIF_SHIFT] = pcie_irq;
+		irqs[PCIE_PORT_SERVICE_BWCTRL_SHIFT] = pcie_irq;
 	}
 
 	if (mask & PCIE_PORT_SERVICE_AER)
@@ -271,7 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
 
 		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
 		if (linkcap & PCI_EXP_LNKCAP_LBNC)
-			services |= PCIE_PORT_SERVICE_BWNOTIF;
+			services |= PCIE_PORT_SERVICE_BWCTRL;
 	}
 
 	return services;
@@ -829,6 +829,7 @@ static void __init pcie_init_services(void)
 	pcie_pme_init();
 	pcie_dpc_init();
 	pcie_hp_init();
+	pcie_bwctrl_init();
 }
 
 static int __init pcie_portdrv_init(void)
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 1f3803bde7ee..9628bb04a8ff 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -20,8 +20,8 @@
 #define PCIE_PORT_SERVICE_HP		(1 << PCIE_PORT_SERVICE_HP_SHIFT)
 #define PCIE_PORT_SERVICE_DPC_SHIFT	3	/* Downstream Port Containment */
 #define PCIE_PORT_SERVICE_DPC		(1 << PCIE_PORT_SERVICE_DPC_SHIFT)
-#define PCIE_PORT_SERVICE_BWNOTIF_SHIFT	4	/* Bandwidth notification */
-#define PCIE_PORT_SERVICE_BWNOTIF	(1 << PCIE_PORT_SERVICE_BWNOTIF_SHIFT)
+#define PCIE_PORT_SERVICE_BWCTRL_SHIFT	4	/* Bandwidth Controller (notifications) */
+#define PCIE_PORT_SERVICE_BWCTRL	(1 << PCIE_PORT_SERVICE_BWCTRL_SHIFT)
 
 #define PCIE_PORT_DEVICE_MAXSERVICES   5
 
@@ -51,6 +51,12 @@ int pcie_dpc_init(void);
 static inline int pcie_dpc_init(void) { return 0; }
 #endif
 
+#ifdef CONFIG_PCIE_BW
+int pcie_bwctrl_init(void);
+#else
+static inline int pcie_bwctrl_init(void) { return 0; }
+#endif
+
 /* Port Type */
 #define PCIE_ANY_PORT			(~0)
 
diff --git a/include/linux/pci-bwctrl.h b/include/linux/pci-bwctrl.h
new file mode 100644
index 000000000000..be51f6f2b340
--- /dev/null
+++ b/include/linux/pci-bwctrl.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * PCIe bandwidth controller
+ *
+ * Copyright (C) 2023 Intel Corporation
+ */
+
+#ifndef LINUX_PCI_BWCTRL_H
+#define LINUX_PCI_BWCTRL_H
+
+#include <linux/pci.h>
+
+struct pcie_device;
+
+int pcie_bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed_req);
+
+#endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 7/8] thermal: Add PCIe cooling driver
  2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
                   ` (5 preceding siblings ...)
  2024-01-05 11:25 ` [PATCH v4 6/8] PCI/link: Re-add BW notification portdrv as PCIe BW controller Ilpo Järvinen
@ 2024-01-05 11:25 ` Ilpo Järvinen
  2024-01-05 11:56   ` Christophe JAILLET
  2024-01-05 11:25 ` [PATCH v4 8/8] selftests/pcie_bwctrl: Create selftests Ilpo Järvinen
  7 siblings, 1 reply; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm, Ilpo Järvinen, Daniel Lezcano,
	Zhang Rui, Lukasz Luba, linux-kernel
  Cc: Alex Deucher, Amit Kucheria

Add a thermal cooling driver to provide path to access PCIe bandwidth
controller using the usual thermal interfaces.

A cooling device is instantiated for controllable PCIe Ports from the
bwctrl service driver.

The thermal side state 0 means no throttling, i.e., maximum supported
PCIe Link Speed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
Acked-by: Rafael J. Wysocki <rafael@kernel.org> # From the cooling device interface perspective
---
 MAINTAINERS                    |   1 +
 drivers/pci/pcie/bwctrl.c      |   6 ++
 drivers/thermal/Kconfig        |  10 +++
 drivers/thermal/Makefile       |   2 +
 drivers/thermal/pcie_cooling.c | 107 +++++++++++++++++++++++++++++++++
 include/linux/pci-bwctrl.h     |  16 +++++
 6 files changed, 142 insertions(+)
 create mode 100644 drivers/thermal/pcie_cooling.c

diff --git a/MAINTAINERS b/MAINTAINERS
index ccc5d50bf340..0cacc7e777c5 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16771,6 +16771,7 @@ M:	Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
 L:	linux-pci@vger.kernel.org
 S:	Supported
 F:	drivers/pci/pcie/bwctrl.c
+F:	drivers/thermal/pcie_cooling.c
 F:	include/linux/pci-bwctrl.h
 
 PCIE DRIVER FOR AMAZON ANNAPURNA LABS
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index e17c98c25f06..6c50fabfea39 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -35,9 +35,11 @@
 /**
  * struct pcie_bwctrl_service_data - PCIe bandwidth controller
  * @set_speed_mutex: serializes link speed changes
+ * @cdev: thermal cooling device associated with the port
  */
 struct pcie_bwctrl_service_data {
 	struct mutex set_speed_mutex;
+	struct thermal_cooling_device *cdev;
 };
 
 static bool pcie_valid_speed(enum pci_bus_speed speed)
@@ -224,6 +226,8 @@ static int pcie_bwnotif_probe(struct pcie_device *srv)
 	pcie_bwnotif_enable(port);
 	pci_info(port, "enabled with IRQ %d\n", srv->irq);
 
+	data->cdev = pcie_cooling_device_register(port, srv);
+
 	return 0;
 }
 
@@ -231,6 +235,8 @@ static void pcie_bwnotif_remove(struct pcie_device *srv)
 {
 	struct pcie_bwctrl_service_data *data = get_service_data(srv);
 
+	if (data->cdev)
+		pcie_cooling_device_unregister(data->cdev);
 	pcie_bwnotif_disable(srv->port);
 	mutex_destroy(&data->set_speed_mutex);
 }
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index c81a00fbca7d..3a071396f1c6 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -219,6 +219,16 @@ config DEVFREQ_THERMAL
 
 	  If you want this support, you should say Y here.
 
+config PCIE_THERMAL
+	bool "PCIe cooling support"
+	depends on PCIEPORTBUS
+	select PCIE_BW
+	help
+	  This implements PCIe cooling mechanism through bandwidth reduction
+	  for PCIe devices.
+
+	  If you want this support, you should say Y here.
+
 config THERMAL_EMULATION
 	bool "Thermal emulation mode support"
 	help
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index c934cab309ae..a0b25a2742b7 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -30,6 +30,8 @@ thermal_sys-$(CONFIG_CPU_IDLE_THERMAL)	+= cpuidle_cooling.o
 # devfreq cooling
 thermal_sys-$(CONFIG_DEVFREQ_THERMAL) += devfreq_cooling.o
 
+thermal_sys-$(CONFIG_PCIE_THERMAL) += pcie_cooling.o
+
 obj-$(CONFIG_K3_THERMAL)	+= k3_bandgap.o k3_j72xx_bandgap.o
 # platform thermal drivers
 obj-y				+= broadcom/
diff --git a/drivers/thermal/pcie_cooling.c b/drivers/thermal/pcie_cooling.c
new file mode 100644
index 000000000000..5fc52d410161
--- /dev/null
+++ b/drivers/thermal/pcie_cooling.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * PCIe cooling device
+ *
+ * Copyright (C) 2023 Intel Corporation
+ */
+
+#include <linux/build_bug.h>
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/pci-bwctrl.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/thermal.h>
+
+#define COOLING_DEV_TYPE_PREFIX		"PCIe_Port_Link_Speed_"
+
+struct pcie_cooling_device {
+	struct pci_dev *port;
+	struct pcie_device *pdev;
+};
+
+static int pcie_cooling_get_max_level(struct thermal_cooling_device *cdev, unsigned long *state)
+{
+	struct pcie_cooling_device *pcie_cdev = cdev->devdata;
+
+	/* cooling state 0 is same as the maximum PCIe speed */
+	*state = pcie_cdev->port->subordinate->max_bus_speed - PCIE_SPEED_2_5GT;
+
+	return 0;
+}
+
+static int pcie_cooling_get_cur_level(struct thermal_cooling_device *cdev, unsigned long *state)
+{
+	struct pcie_cooling_device *pcie_cdev = cdev->devdata;
+
+	/* cooling state 0 is same as the maximum PCIe speed */
+	*state = cdev->max_state -
+		 (pcie_cdev->port->subordinate->cur_bus_speed - PCIE_SPEED_2_5GT);
+
+	return 0;
+}
+
+static int pcie_cooling_set_cur_level(struct thermal_cooling_device *cdev, unsigned long state)
+{
+	struct pcie_cooling_device *pcie_cdev = cdev->devdata;
+	enum pci_bus_speed speed;
+
+	/* cooling state 0 is same as the maximum PCIe speed */
+	speed = (cdev->max_state - state) + PCIE_SPEED_2_5GT;
+
+	return pcie_bwctrl_set_current_speed(pcie_cdev->pdev, speed);
+}
+
+static struct thermal_cooling_device_ops pcie_cooling_ops = {
+	.get_max_state = pcie_cooling_get_max_level,
+	.get_cur_state = pcie_cooling_get_cur_level,
+	.set_cur_state = pcie_cooling_set_cur_level,
+};
+
+struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
+							    struct pcie_device *pdev)
+{
+	struct pcie_cooling_device *pcie_cdev;
+	struct thermal_cooling_device *cdev;
+	size_t name_len;
+	char *name;
+
+	pcie_cdev = kzalloc(sizeof(*pcie_cdev), GFP_KERNEL);
+	if (!pcie_cdev)
+		return ERR_PTR(-ENOMEM);
+
+	pcie_cdev->port = port;
+	pcie_cdev->pdev = pdev;
+
+	name_len = strlen(COOLING_DEV_TYPE_PREFIX) + strlen(pci_name(port)) + 1;
+	name = kzalloc(name_len, GFP_KERNEL);
+	if (!name) {
+		kfree(pcie_cdev);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	snprintf(name, name_len, COOLING_DEV_TYPE_PREFIX "%s", pci_name(port));
+	cdev = thermal_cooling_device_register(name, pcie_cdev, &pcie_cooling_ops);
+	kfree(name);
+
+	return cdev;
+}
+
+void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev)
+{
+	struct pcie_cooling_device *pcie_cdev = cdev->devdata;
+
+	thermal_cooling_device_unregister(cdev);
+	kfree(pcie_cdev);
+}
+
+/* For bus_speed <-> state arithmetic */
+static_assert(PCIE_SPEED_2_5GT + 1 == PCIE_SPEED_5_0GT);
+static_assert(PCIE_SPEED_5_0GT + 1 == PCIE_SPEED_8_0GT);
+static_assert(PCIE_SPEED_8_0GT + 1 == PCIE_SPEED_16_0GT);
+static_assert(PCIE_SPEED_16_0GT + 1 == PCIE_SPEED_32_0GT);
+static_assert(PCIE_SPEED_32_0GT + 1 == PCIE_SPEED_64_0GT);
+
+MODULE_AUTHOR("Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>");
+MODULE_DESCRIPTION("PCIe cooling driver");
diff --git a/include/linux/pci-bwctrl.h b/include/linux/pci-bwctrl.h
index be51f6f2b340..ec5b1c3515d8 100644
--- a/include/linux/pci-bwctrl.h
+++ b/include/linux/pci-bwctrl.h
@@ -11,7 +11,23 @@
 #include <linux/pci.h>
 
 struct pcie_device;
+struct thermal_cooling_device;
 
 int pcie_bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed_req);
 
+#ifdef CONFIG_PCIE_THERMAL
+struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
+							    struct pcie_device *pdev);
+void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev);
+#else
+static inline struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
+									  struct pcie_device *pdev)
+{
+	return NULL;
+}
+static inline void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev)
+{
+}
+#endif
+
 #endif
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* [PATCH v4 8/8] selftests/pcie_bwctrl: Create selftests
  2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
                   ` (6 preceding siblings ...)
  2024-01-05 11:25 ` [PATCH v4 7/8] thermal: Add PCIe cooling driver Ilpo Järvinen
@ 2024-01-05 11:25 ` Ilpo Järvinen
  7 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-05 11:25 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Krishna chaitanya chundru, Srinivas Pandruvada,
	Rafael J. Wysocki, linux-pm, Shuah Khan, Ilpo Järvinen,
	linux-kernel, linux-kselftest
  Cc: Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui

Create selftests for PCIe BW control through the PCIe cooling device
sysfs interface.

First, the BW control selftest finds the PCIe Port to test with. By
default, the PCIe Port with the highest Link Speed is selected but
another PCIe Port can be provided with -d parameter.

The actual test steps the cur_state of the cooling device one-by-one
from max_state to what the cur_state was initially. The speed change
is confirmed by observing the current_link_speed for the corresponding
PCIe Port.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 MAINTAINERS                                   |   1 +
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
 .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 ++++++++++++++++++
 .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 ++++++++++
 5 files changed, 193 insertions(+)
 create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index 0cacc7e777c5..b59e5ffdeac0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16773,6 +16773,7 @@ S:	Supported
 F:	drivers/pci/pcie/bwctrl.c
 F:	drivers/thermal/pcie_cooling.c
 F:	include/linux/pci-bwctrl.h
+F:	tools/testing/selftests/pcie_bwctrl/
 
 PCIE DRIVER FOR AMAZON ANNAPURNA LABS
 M:	Jonathan Chocron <jonnyc@amazon.com>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 3b2061d1c1a5..a86f78c80372 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -60,6 +60,7 @@ TARGETS += net/mptcp
 TARGETS += net/openvswitch
 TARGETS += netfilter
 TARGETS += nsfs
+TARGETS += pcie_bwctrl
 TARGETS += perf_events
 TARGETS += pidfd
 TARGETS += pid_namespace
diff --git a/tools/testing/selftests/pcie_bwctrl/Makefile b/tools/testing/selftests/pcie_bwctrl/Makefile
new file mode 100644
index 000000000000..3e84e26341d1
--- /dev/null
+++ b/tools/testing/selftests/pcie_bwctrl/Makefile
@@ -0,0 +1,2 @@
+TEST_PROGS = set_pcie_cooling_state.sh
+include ../lib.mk
diff --git a/tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh b/tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
new file mode 100755
index 000000000000..3a8f91f0309e
--- /dev/null
+++ b/tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
@@ -0,0 +1,122 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+SYSFS=
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+retval=0
+skipmsg="skip all tests:"
+
+PCIEPORTTYPE="PCIe_Port_Link_Speed"
+
+prerequisite()
+{
+	local ports
+
+	if [ $UID != 0 ]; then
+		echo $skipmsg must be run as root >&2
+		exit $ksft_skip
+	fi
+
+	SYSFS=`mount -t sysfs | head -1 | awk '{ print $3 }'`
+
+	if [ ! -d "$SYSFS" ]; then
+		echo $skipmsg sysfs is not mounted >&2
+		exit $ksft_skip
+	fi
+
+	if ! ls $SYSFS/class/thermal/cooling_device* > /dev/null 2>&1; then
+		echo $skipmsg thermal cooling devices missing >&2
+		exit $ksft_skip
+        fi
+
+	ports=`grep -e "^$PCIEPORTTYPE" $SYSFS/class/thermal/cooling_device*/type | wc -l`
+	if [ $ports -eq 0 ]; then
+		echo $skipmsg pcie cooling devices missing >&2
+		exit $ksft_skip
+	fi
+}
+
+testport=
+find_pcie_port()
+{
+	local patt="$1"
+	local pcieports
+	local max
+	local cur
+	local delta
+	local bestdelta=-1
+
+	pcieports=`grep -l -F -e "$patt" /sys/class/thermal/cooling_device*/type`
+	if [ -z "$pcieports" ]; then
+		return
+	fi
+	pcieports=${pcieports//\/type/}
+	# Find the port with the highest PCIe Link Speed
+	for port in $pcieports; do
+		max=`cat $port/max_state`
+		cur=`cat $port/cur_state`
+		delta=$((max-cur))
+		if [ $delta -gt $bestdelta ]; then
+			testport="$port"
+			bestdelta=$delta
+		fi
+	done
+}
+
+sysfspcidev=
+find_sysfs_pci_dev()
+{
+	local typefile="$1/type"
+	local pcidir
+
+	pcidir="$SYSFS/bus/pci/devices/`sed -e "s|^${PCIEPORTTYPE}_||g" $typefile`"
+
+	if [ -r "$pcidir/current_link_speed" ]; then
+		sysfspcidev="$pcidir/current_link_speed"
+	fi
+}
+
+usage()
+{
+	echo "Usage $0 [ -d dev ]"
+	echo -e "\t-d: PCIe port BDF string (e.g., 0000:00:04.0)"
+}
+
+pattern="$PCIEPORTTYPE"
+parse_arguments()
+{
+	while getopts d:h opt; do
+		case $opt in
+			h)
+				usage "$0"
+				exit 0
+				;;
+			d)
+				pattern="$PCIEPORTTYPE_$OPTARG"
+				;;
+			*)
+				usage "$0"
+				exit 0
+				;;
+		esac
+	done
+}
+
+parse_arguments "$@"
+prerequisite
+find_pcie_port "$pattern"
+if [ -z "$testport" ]; then
+	echo $skipmsg "pcie cooling device not found from sysfs" >&2
+	exit $ksft_skip
+fi
+find_sysfs_pci_dev "$testport"
+if [ -z "$sysfspcidev" ]; then
+	echo $skipmsg "PCIe port device not found from sysfs" >&2
+	exit $ksft_skip
+fi
+
+./set_pcie_speed.sh "$testport" "$sysfspcidev"
+retval=$?
+
+exit $retval
diff --git a/tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh b/tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh
new file mode 100755
index 000000000000..584596949312
--- /dev/null
+++ b/tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh
@@ -0,0 +1,67 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+set -e
+
+TESTNAME=set_pcie_speed
+
+declare -a PCIELINKSPEED=(
+	"2.5 GT/s PCIe"
+	"5.0 GT/s PCIe"
+	"8.0 GT/s PCIe"
+	"16.0 GT/s PCIe"
+	"32.0 GT/s PCIe"
+	"64.0 GT/s PCIe"
+)
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+retval=0
+
+coolingdev="$1"
+statefile="$coolingdev/cur_state"
+maxfile="$coolingdev/max_state"
+linkspeedfile="$2"
+
+oldstate=`cat $statefile`
+maxstate=`cat $maxfile`
+
+set_state()
+{
+	local state=$1
+	local linkspeed
+	local expected_linkspeed
+
+	echo $state > $statefile
+
+	sleep 1
+
+	linkspeed="`cat $linkspeedfile`"
+	expected_linkspeed=$((maxstate-state))
+	expected_str="${PCIELINKSPEED[$expected_linkspeed]}"
+	if [ ! "${expected_str}" = "${linkspeed}" ]; then
+		echo "$TESTNAME failed: expected: ${expected_str}; got ${linkspeed}"
+		retval=1
+	fi
+}
+
+cleanup_skip ()
+{
+	set_state $oldstate
+	exit $ksft_skip
+}
+
+trap cleanup_skip EXIT
+
+echo "$TESTNAME: testing states $maxstate .. $oldstate with $coolingdev"
+for i in $(seq $maxstate -1 $oldstate); do
+	set_state "$i"
+done
+
+trap EXIT
+if [ $retval -eq 0 ]; then
+	echo "$TESTNAME [PASS]"
+else
+	echo "$TESTNAME [FAIL]"
+fi
+exit $retval
-- 
2.39.2


^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 7/8] thermal: Add PCIe cooling driver
  2024-01-05 11:25 ` [PATCH v4 7/8] thermal: Add PCIe cooling driver Ilpo Järvinen
@ 2024-01-05 11:56   ` Christophe JAILLET
  0 siblings, 0 replies; 12+ messages in thread
From: Christophe JAILLET @ 2024-01-05 11:56 UTC (permalink / raw)
  To: Ilpo Järvinen, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Lukas Wunner,
	Alexandru Gagniuc, Krishna chaitanya chundru,
	Srinivas Pandruvada, Rafael J. Wysocki, linux-pm, Daniel Lezcano,
	Zhang Rui, Lukasz Luba, linux-kernel
  Cc: Alex Deucher, Amit Kucheria

Le 05/01/2024 à 12:25, Ilpo Järvinen a écrit :
> Add a thermal cooling driver to provide path to access PCIe bandwidth
> controller using the usual thermal interfaces.
> 
> A cooling device is instantiated for controllable PCIe Ports from the
> bwctrl service driver.
> 
> The thermal side state 0 means no throttling, i.e., maximum supported
> PCIe Link Speed.
> 
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> Acked-by: Rafael J. Wysocki <rafael@kernel.org> # From the cooling device interface perspective
> ---

...

> +struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
> +							    struct pcie_device *pdev)
> +{
> +	struct pcie_cooling_device *pcie_cdev;
> +	struct thermal_cooling_device *cdev;
> +	size_t name_len;
> +	char *name;
> +
> +	pcie_cdev = kzalloc(sizeof(*pcie_cdev), GFP_KERNEL);
> +	if (!pcie_cdev)
> +		return ERR_PTR(-ENOMEM);
> +
> +	pcie_cdev->port = port;
> +	pcie_cdev->pdev = pdev;
> +
> +	name_len = strlen(COOLING_DEV_TYPE_PREFIX) + strlen(pci_name(port)) + 1;
> +	name = kzalloc(name_len, GFP_KERNEL);
> +	if (!name) {
> +		kfree(pcie_cdev);
> +		return ERR_PTR(-ENOMEM);
> +	}
> +
> +	snprintf(name, name_len, COOLING_DEV_TYPE_PREFIX "%s", pci_name(port));

Nit: kasprintf() ?

> +	cdev = thermal_cooling_device_register(name, pcie_cdev, &pcie_cooling_ops);
> +	kfree(name);
> +
> +	return cdev;
> +}



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 6/8] PCI/link: Re-add BW notification portdrv as PCIe BW controller
  2024-01-05 11:25 ` [PATCH v4 6/8] PCI/link: Re-add BW notification portdrv as PCIe BW controller Ilpo Järvinen
@ 2024-01-09  9:14   ` Krishna Chaitanya Chundru
  2024-01-09 13:28     ` Ilpo Järvinen
  0 siblings, 1 reply; 12+ messages in thread
From: Krishna Chaitanya Chundru @ 2024-01-09  9:14 UTC (permalink / raw)
  To: Ilpo Järvinen, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Lukas Wunner,
	Alexandru Gagniuc, Srinivas Pandruvada, Rafael J. Wysocki,
	linux-pm, linux-kernel
  Cc: Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui


On 1/5/2024 4:55 PM, Ilpo Järvinen wrote:
> This mostly reverts b4c7d2076b4e ("PCI/LINK: Remove bandwidth
> notification") and builds PCIe bandwidth controller on top of it.
>
> The PCIe bandwidth notification were first added in the commit
> e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> notification") but later had to be removed. The significant changes
> compared with the old bandwidth notification driver include:
>
> 1) Don't print the notifications into kernel log, just keep the Link
>     Speed cached into the struct pci_bus updated. While somewhat
>     unfortunate, the log spam was the source of complaints that
>     eventually lead to the removal of the bandwidth notifications driver
>     (see the links below for further information).
>
> 2) Besides the Link Bandwidth Management Interrupt, enable also Link
>     Autonomous Bandwidth Interrupt to cover the other source of
>     bandwidth changes.
>
> 3) Use threaded IRQ with IRQF_ONESHOT to handle Bandwidth Notification
>     Interrupts to address the problem fixed in the commit 3e82a7f9031f
>     ("PCI/LINK: Supply IRQ handler so level-triggered IRQs are acked")).
>
> 4) Handle Link Speed updates robustly. Refresh the cached Link Speed
>     when enabling Bandwidth Notification Interrupts, and solve the race
>     between Link Speed read and LBMS/LABS update in
>     pcie_bandwidth_notification_irq_thread().
>
> 5) Use concurrency safe LNKCTL RMW operations.
>
> 6) The driver is now called PCIe bwctrl (bandwidth controller) instead
>     of just bandwidth notifications because of increased scope and
>     functionality within the driver.
>
> PCIe bandwidth controller introduces an in-kernel API to set PCIe Link
> Speed. This new API is intended to be used in an upcoming commit that
> adds a thermal cooling device to throttle PCIe bandwidth when thermal
> thresholds are reached. No users are introduced in this commit yet.
>
> The PCIe bandwidth control procedure is as follows. The highest speed
> supported by the Port and the PCIe device which is not higher than the
> requested speed is selected and written into the Target Link Speed in
> the Link Control 2 Register. Then bandwidth controller retrains the
> PCIe Link.
>
> Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus
> to keep track PCIe Link Speed changes. While Bandwidth Notifications
> should also be generated when bandwidth controller alters the PCIe Link
> Speed, a few platforms do not deliver LMBS interrupt after Link
> Training as expected. Thus, after changing the Link Speed, bandwidth
> controller makes additional read for the Link Status Register to ensure
> cur_bus_speed is consistent with the new PCIe Link Speed.
>
> Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/
> Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/
> Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/
> Suggested-by: Lukas Wunner <lukas@wunner.de>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> ---
>   MAINTAINERS                |   7 +
>   drivers/pci/pcie/Kconfig   |  12 ++
>   drivers/pci/pcie/Makefile  |   1 +
>   drivers/pci/pcie/bwctrl.c  | 263 +++++++++++++++++++++++++++++++++++++
>   drivers/pci/pcie/portdrv.c |   9 +-
>   drivers/pci/pcie/portdrv.h |  10 +-
>   include/linux/pci-bwctrl.h |  17 +++
>   7 files changed, 313 insertions(+), 6 deletions(-)
>   create mode 100644 drivers/pci/pcie/bwctrl.c
>   create mode 100644 include/linux/pci-bwctrl.h
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 97f51d5ec1cf..ccc5d50bf340 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16766,6 +16766,13 @@ F:	include/linux/pci*
>   F:	include/uapi/linux/pci*
>   F:	lib/pci*
>   
> +PCIE BANDWIDTH CONTROLLER
> +M:	Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
> +L:	linux-pci@vger.kernel.org
> +S:	Supported
> +F:	drivers/pci/pcie/bwctrl.c
> +F:	include/linux/pci-bwctrl.h
> +
>   PCIE DRIVER FOR AMAZON ANNAPURNA LABS
>   M:	Jonathan Chocron <jonnyc@amazon.com>
>   L:	linux-pci@vger.kernel.org
> diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
> index 8999fcebde6a..f1ef6030d2c2 100644
> --- a/drivers/pci/pcie/Kconfig
> +++ b/drivers/pci/pcie/Kconfig
> @@ -146,6 +146,18 @@ config PCIE_PTM
>   	  This is only useful if you have devices that support PTM, but it
>   	  is safe to enable even if you don't.
>   
> +config PCIE_BW
> +	bool "PCI Express Bandwidth Controller"
> +	depends on PCIEPORTBUS
> +	help
> +	  This enables PCI Express Bandwidth Controller. The Bandwidth
> +	  Controller allows controlling PCIe Link Speed and listens for Link
> +	  Bandwidth Change Notifications. The current Link Speed is available
> +	  through /sys/bus/pci/devices/.../current_link_speed.
> +
> +	  If you know Link Width or Speed changes occur (e.g., to correct
> +	  unreliable links), you may answer Y.
> +
>   config PCIE_EDR
>   	bool "PCI Express Error Disconnect Recover support"
>   	depends on PCIE_DPC && ACPI
> diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
> index 8de4ed5f98f1..175065a495cf 100644
> --- a/drivers/pci/pcie/Makefile
> +++ b/drivers/pci/pcie/Makefile
> @@ -12,4 +12,5 @@ obj-$(CONFIG_PCIEAER_INJECT)	+= aer_inject.o
>   obj-$(CONFIG_PCIE_PME)		+= pme.o
>   obj-$(CONFIG_PCIE_DPC)		+= dpc.o
>   obj-$(CONFIG_PCIE_PTM)		+= ptm.o
> +obj-$(CONFIG_PCIE_BW)		+= bwctrl.o
>   obj-$(CONFIG_PCIE_EDR)		+= edr.o
> diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
> new file mode 100644
> index 000000000000..e17c98c25f06
> --- /dev/null
> +++ b/drivers/pci/pcie/bwctrl.c
> @@ -0,0 +1,263 @@
> +// SPDX-License-Identifier: GPL-2.0+
> +/*
> + * PCIe bandwidth controller
> + *
> + * Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
> + *
> + * Copyright (C) 2019 Dell Inc
> + * Copyright (C) 2023 Intel Corporation
> + *
> + * The PCIe bandwidth controller provides a way to alter PCIe Link Speeds
> + * and notify the operating system when the Link Width or Speed changes. The
> + * notification capability is required for all Root Ports and Downstream
> + * Ports supporting Link Width wider than x1 and/or multiple Link Speeds.
> + *
> + * This service port driver hooks into the Bandwidth Notification interrupt
> + * watching for changes or links becoming degraded in operation. It updates
> + * the cached Current Link Speed that is exposed to user space through sysfs.
> + */
> +
> +#define dev_fmt(fmt) "bwctrl: " fmt
> +
> +#include <linux/bitops.h>
> +#include <linux/bits.h>
> +#include <linux/device.h>
> +#include <linux/errno.h>
> +#include <linux/interrupt.h>
> +#include <linux/mutex.h>
> +#include <linux/pci.h>
> +#include <linux/pci-bwctrl.h>
> +#include <linux/types.h>
> +
> +#include "../pci.h"
> +#include "portdrv.h"
> +
> +/**
> + * struct pcie_bwctrl_service_data - PCIe bandwidth controller
> + * @set_speed_mutex: serializes link speed changes
> + */
> +struct pcie_bwctrl_service_data {
> +	struct mutex set_speed_mutex;
> +};
> +
> +static bool pcie_valid_speed(enum pci_bus_speed speed)
> +{
> +	return (speed >= PCIE_SPEED_2_5GT) && (speed <= PCIE_SPEED_64_0GT);
> +}
> +
> +static u16 pci_bus_speed2lnkctl2(enum pci_bus_speed speed)
> +{
> +	static const u8 speed_conv[] = {
> +		[PCIE_SPEED_2_5GT] = PCI_EXP_LNKCTL2_TLS_2_5GT,
> +		[PCIE_SPEED_5_0GT] = PCI_EXP_LNKCTL2_TLS_5_0GT,
> +		[PCIE_SPEED_8_0GT] = PCI_EXP_LNKCTL2_TLS_8_0GT,
> +		[PCIE_SPEED_16_0GT] = PCI_EXP_LNKCTL2_TLS_16_0GT,
> +		[PCIE_SPEED_32_0GT] = PCI_EXP_LNKCTL2_TLS_32_0GT,
> +		[PCIE_SPEED_64_0GT] = PCI_EXP_LNKCTL2_TLS_64_0GT,
> +	};
> +
> +	if (WARN_ON_ONCE(!pcie_valid_speed(speed)))
> +		return 0;
> +
> +	return speed_conv[speed];
> +}
> +
> +static inline u16 pcie_supported_speeds2target_speed(u8 supported_speeds)
> +{
> +	return __fls(supported_speeds);
> +}
> +
> +static void pcie_bwnotif_enable(struct pci_dev *dev)
> +{
> +	u16 link_status;
> +	int ret;
> +
> +	pcie_capability_set_word(dev, PCI_EXP_LNKCTL,
> +				 PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
> +	pcie_capability_write_word(dev, PCI_EXP_LNKSTA,
> +				   PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_LABS);
> +
> +	/* Read after enabling notifications to ensure link speed is up to date */
> +	ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &link_status);
> +	if (ret == PCIBIOS_SUCCESSFUL)
> +		pcie_update_link_speed(dev->subordinate, link_status);
> +}
> +
> +static void pcie_bwnotif_disable(struct pci_dev *dev)
> +{
> +	pcie_capability_clear_word(dev, PCI_EXP_LNKCTL,
> +				   PCI_EXP_LNKCTL_LBMIE | PCI_EXP_LNKCTL_LABIE);
> +}
> +
> +static irqreturn_t pcie_bwnotif_irq_thread(int irq, void *context)
> +{
> +	struct pcie_device *srv = context;
> +	struct pci_dev *port = srv->port;
> +	u16 link_status, events;
> +	int ret;
> +
> +	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
> +	events = link_status & (PCI_EXP_LNKSTA_LBMS | PCI_EXP_LNKSTA_LABS);
> +
> +	if (ret != PCIBIOS_SUCCESSFUL || !events)
> +		return IRQ_NONE;
> +
> +	pcie_capability_write_word(port, PCI_EXP_LNKSTA, events);
> +
> +	/*
> +	 * The write to clear LBMS prevents getting interrupt from the
> +	 * latest link speed when the link speed changes between the above
> +	 * LNKSTA read and write. Therefore, re-read the speed before
> +	 * updating it.
> +	 */
> +	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
> +	if (ret != PCIBIOS_SUCCESSFUL)
> +		return IRQ_HANDLED;
> +	pcie_update_link_speed(port->subordinate, link_status);
> +
> +	return IRQ_HANDLED;
> +}
> +
> +/**
> + * pcie_bwctrl_select_speed - Select Target Link Speed
> + * @srv:	PCIe Port
> + * @speed_req:	requested PCIe Link Speed
> + *
> + * Select Target Link Speed by take into account Supported Link Speeds of
> + * both the Root Port and the Endpoint.
> + *
> + * Return: Target Link Speed (1=2.5GT/s, 2=5GT/s, 3=8GT/s, etc.)
> + */
> +static u16 pcie_bwctrl_select_speed(struct pcie_device *srv, enum pci_bus_speed speed_req)
> +{
> +	struct pci_bus *bus = srv->port->subordinate;
> +	u8 desired_speeds;
> +
> +	if (WARN_ON_ONCE(!bus->supported_speeds))
> +		return PCI_EXP_LNKCAP2_SLS_2_5GB;
> +
> +	desired_speeds = GENMASK(pci_bus_speed2lnkctl2(speed_req),
> +				 __fls(PCI_EXP_LNKCAP2_SLS_2_5GB));
> +
> +	return pcie_supported_speeds2target_speed(bus->supported_speeds & desired_speeds);
> +}
> +
> +/**
> + * pcie_bwctrl_set_current_speed - Set downstream Link Speed for PCIe Port
> + * @srv:	PCIe Port
> + * @speed_req:	requested PCIe Link Speed
> + *
> + * Attempts to set PCIe Port Link Speed to @speed_req. @speed_req may be
> + * adjusted downwards to the best speed supported by both the Port and PCIe
> + * Device underneath it.
> + *
> + * Return:
> + * * 0 		- on success
> + * * -EINVAL 	- @speed_req is not a PCIe Link Speed
> + * * -ETIMEDOUT	- changing Link Speed took too long
> + * * -EAGAIN	- Link Speed was changed but @speed_req was not achieved
> + */
> +int pcie_bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed_req)

Hi Ilpo Järvinen,

we want to use this API from PCIe client driver, but can't use API as 
client driver is not aware of pcie_device structure,

can you please make changes in this API so that PCIe client drivers can 
also use it. And also can you please export this API.

Thanks & Regards,

Krishna Chaitanya.

> +{
> +	struct pcie_bwctrl_service_data *data = get_service_data(srv);
> +	struct pci_dev *port = srv->port;
> +	u16 target_speed, link_status;
> +	int ret;
> +
> +	if (WARN_ON_ONCE(!pcie_valid_speed(speed_req)))
> +		return -EINVAL;
> +
> +	if (port->subordinate->cur_bus_speed == speed_req)
> +		return 0;
> +
> +	target_speed = pcie_bwctrl_select_speed(srv, speed_req);
> +
> +	mutex_lock(&data->set_speed_mutex);
> +	ret = pcie_capability_clear_and_set_word(port, PCI_EXP_LNKCTL2,
> +						 PCI_EXP_LNKCTL2_TLS, target_speed);
> +	if (ret != PCIBIOS_SUCCESSFUL) {
> +		ret = pcibios_err_to_errno(ret);
> +		goto unlock;
> +	}
> +
> +	ret = pcie_retrain_link(port, true);
> +	if (ret < 0)
> +		goto unlock;
> +
> +	/*
> +	 * Ensure link speed updates also with platforms that have problems
> +	 * with notifications
> +	 */
> +	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
> +	if (ret == PCIBIOS_SUCCESSFUL)
> +		pcie_update_link_speed(port->subordinate, link_status);
> +
> +	if (port->subordinate->cur_bus_speed != speed_req)
> +		ret = -EAGAIN;
> +
> +unlock:
> +	mutex_unlock(&data->set_speed_mutex);
> +
> +	return ret;
> +}
> +
> +static int pcie_bwnotif_probe(struct pcie_device *srv)
> +{
> +	struct pcie_bwctrl_service_data *data;
> +	struct pci_dev *port = srv->port;
> +	int ret;
> +
> +	data = devm_kzalloc(&srv->device, sizeof(*data), GFP_KERNEL);
> +	if (!data)
> +		return -ENOMEM;
> +
> +	mutex_init(&data->set_speed_mutex);
> +	set_service_data(srv, data);
> +
> +	ret = devm_request_threaded_irq(&srv->device, srv->irq, NULL,
> +					pcie_bwnotif_irq_thread,
> +					IRQF_SHARED | IRQF_ONESHOT,
> +					"PCIe BW ctrl", srv);
> +	if (ret)
> +		return ret;
> +
> +	pcie_bwnotif_enable(port);
> +	pci_info(port, "enabled with IRQ %d\n", srv->irq);
> +
> +	return 0;
> +}
> +
> +static void pcie_bwnotif_remove(struct pcie_device *srv)
> +{
> +	struct pcie_bwctrl_service_data *data = get_service_data(srv);
> +
> +	pcie_bwnotif_disable(srv->port);
> +	mutex_destroy(&data->set_speed_mutex);
> +}
> +
> +static int pcie_bwnotif_suspend(struct pcie_device *srv)
> +{
> +	pcie_bwnotif_disable(srv->port);
> +	return 0;
> +}
> +
> +static int pcie_bwnotif_resume(struct pcie_device *srv)
> +{
> +	pcie_bwnotif_enable(srv->port);
> +	return 0;
> +}
> +
> +static struct pcie_port_service_driver pcie_bwctrl_driver = {
> +	.name		= "pcie_bwctrl",
> +	.port_type	= PCIE_ANY_PORT,
> +	.service	= PCIE_PORT_SERVICE_BWCTRL,
> +	.probe		= pcie_bwnotif_probe,
> +	.suspend	= pcie_bwnotif_suspend,
> +	.resume		= pcie_bwnotif_resume,
> +	.remove		= pcie_bwnotif_remove,
> +};
> +
> +int __init pcie_bwctrl_init(void)
> +{
> +	return pcie_port_service_register(&pcie_bwctrl_driver);
> +}
> diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
> index 14a4b89a3b83..e8a348949d70 100644
> --- a/drivers/pci/pcie/portdrv.c
> +++ b/drivers/pci/pcie/portdrv.c
> @@ -68,7 +68,7 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
>   	 */
>   
>   	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> -		    PCIE_PORT_SERVICE_BWNOTIF)) {
> +		    PCIE_PORT_SERVICE_BWCTRL)) {
>   		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
>   		*pme = FIELD_GET(PCI_EXP_FLAGS_IRQ, reg16);
>   		nvec = *pme + 1;
> @@ -150,11 +150,11 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
>   
>   	/* PME, hotplug and bandwidth notification share an MSI/MSI-X vector */
>   	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
> -		    PCIE_PORT_SERVICE_BWNOTIF)) {
> +		    PCIE_PORT_SERVICE_BWCTRL)) {
>   		pcie_irq = pci_irq_vector(dev, pme);
>   		irqs[PCIE_PORT_SERVICE_PME_SHIFT] = pcie_irq;
>   		irqs[PCIE_PORT_SERVICE_HP_SHIFT] = pcie_irq;
> -		irqs[PCIE_PORT_SERVICE_BWNOTIF_SHIFT] = pcie_irq;
> +		irqs[PCIE_PORT_SERVICE_BWCTRL_SHIFT] = pcie_irq;
>   	}
>   
>   	if (mask & PCIE_PORT_SERVICE_AER)
> @@ -271,7 +271,7 @@ static int get_port_device_capability(struct pci_dev *dev)
>   
>   		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
>   		if (linkcap & PCI_EXP_LNKCAP_LBNC)
> -			services |= PCIE_PORT_SERVICE_BWNOTIF;
> +			services |= PCIE_PORT_SERVICE_BWCTRL;
>   	}
>   
>   	return services;
> @@ -829,6 +829,7 @@ static void __init pcie_init_services(void)
>   	pcie_pme_init();
>   	pcie_dpc_init();
>   	pcie_hp_init();
> +	pcie_bwctrl_init();
>   }
>   
>   static int __init pcie_portdrv_init(void)
> diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
> index 1f3803bde7ee..9628bb04a8ff 100644
> --- a/drivers/pci/pcie/portdrv.h
> +++ b/drivers/pci/pcie/portdrv.h
> @@ -20,8 +20,8 @@
>   #define PCIE_PORT_SERVICE_HP		(1 << PCIE_PORT_SERVICE_HP_SHIFT)
>   #define PCIE_PORT_SERVICE_DPC_SHIFT	3	/* Downstream Port Containment */
>   #define PCIE_PORT_SERVICE_DPC		(1 << PCIE_PORT_SERVICE_DPC_SHIFT)
> -#define PCIE_PORT_SERVICE_BWNOTIF_SHIFT	4	/* Bandwidth notification */
> -#define PCIE_PORT_SERVICE_BWNOTIF	(1 << PCIE_PORT_SERVICE_BWNOTIF_SHIFT)
> +#define PCIE_PORT_SERVICE_BWCTRL_SHIFT	4	/* Bandwidth Controller (notifications) */
> +#define PCIE_PORT_SERVICE_BWCTRL	(1 << PCIE_PORT_SERVICE_BWCTRL_SHIFT)
>   
>   #define PCIE_PORT_DEVICE_MAXSERVICES   5
>   
> @@ -51,6 +51,12 @@ int pcie_dpc_init(void);
>   static inline int pcie_dpc_init(void) { return 0; }
>   #endif
>   
> +#ifdef CONFIG_PCIE_BW
> +int pcie_bwctrl_init(void);
> +#else
> +static inline int pcie_bwctrl_init(void) { return 0; }
> +#endif
> +
>   /* Port Type */
>   #define PCIE_ANY_PORT			(~0)
>   
> diff --git a/include/linux/pci-bwctrl.h b/include/linux/pci-bwctrl.h
> new file mode 100644
> index 000000000000..be51f6f2b340
> --- /dev/null
> +++ b/include/linux/pci-bwctrl.h
> @@ -0,0 +1,17 @@
> +/* SPDX-License-Identifier: GPL-2.0-only */
> +/*
> + * PCIe bandwidth controller
> + *
> + * Copyright (C) 2023 Intel Corporation
> + */
> +
> +#ifndef LINUX_PCI_BWCTRL_H
> +#define LINUX_PCI_BWCTRL_H
> +
> +#include <linux/pci.h>
> +
> +struct pcie_device;
> +
> +int pcie_bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed_req);
> +
> +#endif

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: [PATCH v4 6/8] PCI/link: Re-add BW notification portdrv as PCIe BW controller
  2024-01-09  9:14   ` Krishna Chaitanya Chundru
@ 2024-01-09 13:28     ` Ilpo Järvinen
  0 siblings, 0 replies; 12+ messages in thread
From: Ilpo Järvinen @ 2024-01-09 13:28 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Srinivas Pandruvada, Rafael J. Wysocki, linux-pm, LKML,
	Alex Deucher, Daniel Lezcano, Amit Kucheria, Zhang Rui

[-- Attachment #1: Type: text/plain, Size: 4367 bytes --]

On Tue, 9 Jan 2024, Krishna Chaitanya Chundru wrote:
> On 1/5/2024 4:55 PM, Ilpo Järvinen wrote:
> > This mostly reverts b4c7d2076b4e ("PCI/LINK: Remove bandwidth
> > notification") and builds PCIe bandwidth controller on top of it.
> > 
> > The PCIe bandwidth notification were first added in the commit
> > e8303bb7a75c ("PCI/LINK: Report degraded links via link bandwidth
> > notification") but later had to be removed. The significant changes
> > compared with the old bandwidth notification driver include:
> > 
> > 1) Don't print the notifications into kernel log, just keep the Link
> >     Speed cached into the struct pci_bus updated. While somewhat
> >     unfortunate, the log spam was the source of complaints that
> >     eventually lead to the removal of the bandwidth notifications driver
> >     (see the links below for further information).
> > 
> > 2) Besides the Link Bandwidth Management Interrupt, enable also Link
> >     Autonomous Bandwidth Interrupt to cover the other source of
> >     bandwidth changes.
> > 
> > 3) Use threaded IRQ with IRQF_ONESHOT to handle Bandwidth Notification
> >     Interrupts to address the problem fixed in the commit 3e82a7f9031f
> >     ("PCI/LINK: Supply IRQ handler so level-triggered IRQs are acked")).
> > 
> > 4) Handle Link Speed updates robustly. Refresh the cached Link Speed
> >     when enabling Bandwidth Notification Interrupts, and solve the race
> >     between Link Speed read and LBMS/LABS update in
> >     pcie_bandwidth_notification_irq_thread().
> > 
> > 5) Use concurrency safe LNKCTL RMW operations.
> > 
> > 6) The driver is now called PCIe bwctrl (bandwidth controller) instead
> >     of just bandwidth notifications because of increased scope and
> >     functionality within the driver.
> > 
> > PCIe bandwidth controller introduces an in-kernel API to set PCIe Link
> > Speed. This new API is intended to be used in an upcoming commit that
> > adds a thermal cooling device to throttle PCIe bandwidth when thermal
> > thresholds are reached. No users are introduced in this commit yet.
> > 
> > The PCIe bandwidth control procedure is as follows. The highest speed
> > supported by the Port and the PCIe device which is not higher than the
> > requested speed is selected and written into the Target Link Speed in
> > the Link Control 2 Register. Then bandwidth controller retrains the
> > PCIe Link.
> > 
> > Bandwidth Notifications enable the cur_bus_speed in the struct pci_bus
> > to keep track PCIe Link Speed changes. While Bandwidth Notifications
> > should also be generated when bandwidth controller alters the PCIe Link
> > Speed, a few platforms do not deliver LMBS interrupt after Link
> > Training as expected. Thus, after changing the Link Speed, bandwidth
> > controller makes additional read for the Link Status Register to ensure
> > cur_bus_speed is consistent with the new PCIe Link Speed.
> > 
> > Link:
> > https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/
> > Link:
> > https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/
> > Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/
> > Suggested-by: Lukas Wunner <lukas@wunner.de>
> > Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

> > +/**
> > + * pcie_bwctrl_set_current_speed - Set downstream Link Speed for PCIe Port
> > + * @srv:	PCIe Port
> > + * @speed_req:	requested PCIe Link Speed
> > + *
> > + * Attempts to set PCIe Port Link Speed to @speed_req. @speed_req may be
> > + * adjusted downwards to the best speed supported by both the Port and PCIe
> > + * Device underneath it.
> > + *
> > + * Return:
> > + * * 0 		- on success
> > + * * -EINVAL 	- @speed_req is not a PCIe Link Speed
> > + * * -ETIMEDOUT	- changing Link Speed took too long
> > + * * -EAGAIN	- Link Speed was changed but @speed_req was not achieved
> > + */
> > +int pcie_bwctrl_set_current_speed(struct pcie_device *srv, enum
> > pci_bus_speed speed_req)
> 
> we want to use this API from PCIe client driver, but can't use API as client
> driver is not aware of pcie_device structure,
> 
> can you please make changes in this API so that PCIe client drivers can also
> use it. And also can you please export this API.

I'll make v5 interface based on struct pci_dev *. It will require looking 
up the internal data structure though.

-- 
 i.

^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2024-01-09 13:28 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-01-05 11:25 [PATCH v4 0/8] Add PCIe bandwidth controller Ilpo Järvinen
2024-01-05 11:25 ` [PATCH v4 1/8] PCI: Protect Link Control 2 Register with RMW locking Ilpo Järvinen
2024-01-05 11:25 ` [PATCH v4 2/8] drm/radeon: Use RMW accessors for changing LNKCTL2 Ilpo Järvinen
2024-01-05 11:25 ` [PATCH v4 3/8] drm/amdgpu: " Ilpo Järvinen
2024-01-05 11:25 ` [PATCH v4 4/8] RDMA/hfi1: " Ilpo Järvinen
2024-01-05 11:25 ` [PATCH v4 5/8] PCI: Store all PCIe Supported Link Speeds Ilpo Järvinen
2024-01-05 11:25 ` [PATCH v4 6/8] PCI/link: Re-add BW notification portdrv as PCIe BW controller Ilpo Järvinen
2024-01-09  9:14   ` Krishna Chaitanya Chundru
2024-01-09 13:28     ` Ilpo Järvinen
2024-01-05 11:25 ` [PATCH v4 7/8] thermal: Add PCIe cooling driver Ilpo Järvinen
2024-01-05 11:56   ` Christophe JAILLET
2024-01-05 11:25 ` [PATCH v4 8/8] selftests/pcie_bwctrl: Create selftests Ilpo Järvinen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).