All of lore.kernel.org
 help / color / mirror / Atom feed
* [PATCH 00/10] Add PCIe Bandwidth Controller
@ 2023-08-17 12:16 Ilpo Järvinen
  2023-08-17 12:16 ` [PATCH 01/10] PCI: Protect Link Control 2 Register with RMW locking Ilpo Järvinen
                   ` (10 more replies)
  0 siblings, 11 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:16 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc
  Cc: linux-kernel, Krishna chaitanya chundru, Srinivas Pandruvada,
	Alex Deucher, Ilpo Järvinen

Hi all,

This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
cooling driver to the thermal core side for limiting PCIe link speed
due to thermal reasons. PCIe bandwidth controller is a PCI express bus
port service driver. A cooling device is created for each port the
service driver finds if they support changing speeds.

bwctrl now is built on top of BW notifications revert. I'm just not
sure what is the best practice when re-adding some old functionality in
a modified form so please let me know if I need to somehow alter that
patch.

The series is based on top of the RMW changes in pci/pcie-rmw.

Ilpo Järvinen (10):
  PCI: Protect Link Control 2 Register with RMW locking
  drm/radeon: Use RMW accessors for changing LNKCTL2
  drm/amdgpu: Use RMW accessors for changing LNKCTL2
  drm/IB/hfi1: Use RMW accessors for changing LNKCTL2
  PCI: Store all PCIe Supported Link Speeds
  PCI: Cache PCIe device's Supported Speed Vector
  PCI/LINK: Re-add BW notification portdrv as PCIe BW controller
  PCI/bwctrl: Add "controller" part into PCIe bwctrl
  thermal: Add PCIe cooling driver
  selftests/pcie_bwctrl: Create selftests

 MAINTAINERS                                   |   8 +
 drivers/gpu/drm/amd/amdgpu/cik.c              |  41 +--
 drivers/gpu/drm/amd/amdgpu/si.c               |  41 +--
 drivers/gpu/drm/radeon/cik.c                  |  40 +--
 drivers/gpu/drm/radeon/si.c                   |  40 +--
 drivers/infiniband/hw/hfi1/pcie.c             |  30 +-
 drivers/pci/pcie/Kconfig                      |   9 +
 drivers/pci/pcie/Makefile                     |   1 +
 drivers/pci/pcie/bwctrl.c                     | 309 ++++++++++++++++++
 drivers/pci/pcie/portdrv.c                    |   9 +-
 drivers/pci/pcie/portdrv.h                    |  10 +-
 drivers/pci/probe.c                           |  38 ++-
 drivers/pci/remove.c                          |   2 +
 drivers/thermal/Kconfig                       |  10 +
 drivers/thermal/Makefile                      |   2 +
 drivers/thermal/pcie_cooling.c                | 107 ++++++
 include/linux/pci-bwctrl.h                    |  33 ++
 include/linux/pci.h                           |   3 +
 include/uapi/linux/pci_regs.h                 |   1 +
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
 .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 +++++++
 .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 ++++
 23 files changed, 795 insertions(+), 131 deletions(-)
 create mode 100644 drivers/pci/pcie/bwctrl.c
 create mode 100644 drivers/thermal/pcie_cooling.c
 create mode 100644 include/linux/pci-bwctrl.h
 create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh

-- 
2.30.2


^ permalink raw reply	[flat|nested] 24+ messages in thread

* [PATCH 01/10] PCI: Protect Link Control 2 Register with RMW locking
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
@ 2023-08-17 12:16 ` Ilpo Järvinen
  2023-08-17 12:17   ` Ilpo Järvinen
                   ` (9 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:16 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Bjorn Helgaas, linux-kernel
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher,
	Ilpo Järvinen

PCIe Bandwidth Controller performs RMW accesses the Link Control 2
Register which can occur concurrently to other sources of Link Control
2 Register writes. Therefore, add Link Control 2 Register among the PCI
Express Capability Registers that need RMW locking.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 include/linux/pci.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7ee498cd1f37..7b2927a90ee0 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -1243,6 +1243,7 @@ static inline int pcie_capability_clear_and_set_word(struct pci_dev *dev,
 {
 	switch (pos) {
 	case PCI_EXP_LNKCTL:
+	case PCI_EXP_LNKCTL2:
 	case PCI_EXP_RTCTL:
 		return pcie_capability_clear_and_set_word_locked(dev, pos,
 								 clear, set);
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/10] drm/radeon: Use RMW accessors for changing LNKCTL2
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
  2023-08-17 12:16 ` [PATCH 01/10] PCI: Protect Link Control 2 Register with RMW locking Ilpo Järvinen
@ 2023-08-17 12:17   ` Ilpo Järvinen
  2023-08-17 12:17   ` Ilpo Järvinen
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, amd-gfx, dri-devel, linux-kernel
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher,
	Ilpo Järvinen

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/gpu/drm/radeon/cik.c | 40 ++++++++++++++----------------------
 drivers/gpu/drm/radeon/si.c  | 40 ++++++++++++++----------------------
 2 files changed, 30 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index a6f3c811ceb8..b1558b822f9c 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -9592,28 +9592,18 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(rdev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(rdev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 |
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 |
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -9627,15 +9617,15 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
+	tmp16 = 0;
 	if (speed_cap == PCIE_SPEED_8_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (speed_cap == PCIE_SPEED_5_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index a91012447b56..32871ca09a0f 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -7189,28 +7189,18 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(rdev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(rdev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -7224,15 +7214,15 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
+	tmp16 = 0;
 	if (speed_cap == PCIE_SPEED_8_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (speed_cap == PCIE_SPEED_5_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/10] drm/radeon: Use RMW accessors for changing LNKCTL2
@ 2023-08-17 12:17   ` Ilpo Järvinen
  0 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, amd-gfx, dri-devel, linux-kernel
  Cc: Alex Deucher, Ilpo Järvinen, Krishna chaitanya chundru,
	Srinivas Pandruvada

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/gpu/drm/radeon/cik.c | 40 ++++++++++++++----------------------
 drivers/gpu/drm/radeon/si.c  | 40 ++++++++++++++----------------------
 2 files changed, 30 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index a6f3c811ceb8..b1558b822f9c 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -9592,28 +9592,18 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(rdev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(rdev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 |
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 |
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -9627,15 +9617,15 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
+	tmp16 = 0;
 	if (speed_cap == PCIE_SPEED_8_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (speed_cap == PCIE_SPEED_5_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index a91012447b56..32871ca09a0f 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -7189,28 +7189,18 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(rdev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(rdev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -7224,15 +7214,15 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
+	tmp16 = 0;
 	if (speed_cap == PCIE_SPEED_8_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (speed_cap == PCIE_SPEED_5_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 02/10] drm/radeon: Use RMW accessors for changing LNKCTL2
@ 2023-08-17 12:17   ` Ilpo Järvinen
  0 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, amd-gfx, dri-devel, linux-kernel
  Cc: Ilpo Järvinen, Krishna chaitanya chundru, Srinivas Pandruvada

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/gpu/drm/radeon/cik.c | 40 ++++++++++++++----------------------
 drivers/gpu/drm/radeon/si.c  | 40 ++++++++++++++----------------------
 2 files changed, 30 insertions(+), 50 deletions(-)

diff --git a/drivers/gpu/drm/radeon/cik.c b/drivers/gpu/drm/radeon/cik.c
index a6f3c811ceb8..b1558b822f9c 100644
--- a/drivers/gpu/drm/radeon/cik.c
+++ b/drivers/gpu/drm/radeon/cik.c
@@ -9592,28 +9592,18 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(rdev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(rdev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 |
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 |
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -9627,15 +9617,15 @@ static void cik_pcie_gen3_enable(struct radeon_device *rdev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
+	tmp16 = 0;
 	if (speed_cap == PCIE_SPEED_8_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (speed_cap == PCIE_SPEED_5_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
diff --git a/drivers/gpu/drm/radeon/si.c b/drivers/gpu/drm/radeon/si.c
index a91012447b56..32871ca09a0f 100644
--- a/drivers/gpu/drm/radeon/si.c
+++ b/drivers/gpu/drm/radeon/si.c
@@ -7189,28 +7189,18 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(rdev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(rdev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -7224,15 +7214,15 @@ static void si_pcie_gen3_enable(struct radeon_device *rdev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(rdev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
+	tmp16 = 0;
 	if (speed_cap == PCIE_SPEED_8_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (speed_cap == PCIE_SPEED_5_0GT)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(rdev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(rdev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/10] drm/amdgpu: Use RMW accessors for changing LNKCTL2
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
  2023-08-17 12:16 ` [PATCH 01/10] PCI: Protect Link Control 2 Register with RMW locking Ilpo Järvinen
@ 2023-08-17 12:17   ` Ilpo Järvinen
  2023-08-17 12:17   ` Ilpo Järvinen
                     ` (8 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, amd-gfx, dri-devel, linux-kernel
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher,
	Ilpo Järvinen

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/cik.c | 41 ++++++++++++--------------------
 drivers/gpu/drm/amd/amdgpu/si.c  | 41 ++++++++++++--------------------
 2 files changed, 30 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index e63abdf52b6c..7bcd41996927 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -1638,28 +1638,18 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(adev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(adev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE(ixPCIE_LC_CNTL4);
 				tmp &= ~PCIE_LC_CNTL4__LC_SET_QUIESCE_MASK;
@@ -1674,16 +1664,15 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
 	speed_cntl &= ~PCIE_LC_SPEED_CNTL__LC_FORCE_DIS_SW_SPEED_CHANGE_MASK;
 	WREG32_PCIE(ixPCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
-
+	tmp16 = 0;
 	if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE(ixPCIE_LC_SPEED_CNTL);
 	speed_cntl |= PCIE_LC_SPEED_CNTL__LC_INITIATE_LINK_SPEED_CHANGE_MASK;
diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index 4b81f29e5fd5..8ea60fdd1b1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -2331,28 +2331,18 @@ static void si_pcie_gen3_enable(struct amdgpu_device *adev)
 								   gpu_cfg &
 								   PCI_EXP_LNKCTL_HAWD);
 
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(adev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(adev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -2365,16 +2355,15 @@ static void si_pcie_gen3_enable(struct amdgpu_device *adev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
-
+	tmp16 = 0;
 	if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/10] drm/amdgpu: Use RMW accessors for changing LNKCTL2
@ 2023-08-17 12:17   ` Ilpo Järvinen
  0 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, amd-gfx, dri-devel, linux-kernel
  Cc: Alex Deucher, Ilpo Järvinen, Krishna chaitanya chundru,
	Srinivas Pandruvada

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/cik.c | 41 ++++++++++++--------------------
 drivers/gpu/drm/amd/amdgpu/si.c  | 41 ++++++++++++--------------------
 2 files changed, 30 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index e63abdf52b6c..7bcd41996927 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -1638,28 +1638,18 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(adev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(adev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE(ixPCIE_LC_CNTL4);
 				tmp &= ~PCIE_LC_CNTL4__LC_SET_QUIESCE_MASK;
@@ -1674,16 +1664,15 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
 	speed_cntl &= ~PCIE_LC_SPEED_CNTL__LC_FORCE_DIS_SW_SPEED_CHANGE_MASK;
 	WREG32_PCIE(ixPCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
-
+	tmp16 = 0;
 	if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE(ixPCIE_LC_SPEED_CNTL);
 	speed_cntl |= PCIE_LC_SPEED_CNTL__LC_INITIATE_LINK_SPEED_CHANGE_MASK;
diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index 4b81f29e5fd5..8ea60fdd1b1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -2331,28 +2331,18 @@ static void si_pcie_gen3_enable(struct amdgpu_device *adev)
 								   gpu_cfg &
 								   PCI_EXP_LNKCTL_HAWD);
 
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(adev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(adev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -2365,16 +2355,15 @@ static void si_pcie_gen3_enable(struct amdgpu_device *adev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
-
+	tmp16 = 0;
 	if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 03/10] drm/amdgpu: Use RMW accessors for changing LNKCTL2
@ 2023-08-17 12:17   ` Ilpo Järvinen
  0 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Alex Deucher, Christian König, Pan, Xinhui, David Airlie,
	Daniel Vetter, amd-gfx, dri-devel, linux-kernel
  Cc: Ilpo Järvinen, Krishna chaitanya chundru, Srinivas Pandruvada

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/gpu/drm/amd/amdgpu/cik.c | 41 ++++++++++++--------------------
 drivers/gpu/drm/amd/amdgpu/si.c  | 41 ++++++++++++--------------------
 2 files changed, 30 insertions(+), 52 deletions(-)

diff --git a/drivers/gpu/drm/amd/amdgpu/cik.c b/drivers/gpu/drm/amd/amdgpu/cik.c
index e63abdf52b6c..7bcd41996927 100644
--- a/drivers/gpu/drm/amd/amdgpu/cik.c
+++ b/drivers/gpu/drm/amd/amdgpu/cik.c
@@ -1638,28 +1638,18 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
 								   PCI_EXP_LNKCTL_HAWD);
 
 				/* linkctl2 */
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(adev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(adev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE(ixPCIE_LC_CNTL4);
 				tmp &= ~PCIE_LC_CNTL4__LC_SET_QUIESCE_MASK;
@@ -1674,16 +1664,15 @@ static void cik_pcie_gen3_enable(struct amdgpu_device *adev)
 	speed_cntl &= ~PCIE_LC_SPEED_CNTL__LC_FORCE_DIS_SW_SPEED_CHANGE_MASK;
 	WREG32_PCIE(ixPCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
-
+	tmp16 = 0;
 	if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE(ixPCIE_LC_SPEED_CNTL);
 	speed_cntl |= PCIE_LC_SPEED_CNTL__LC_INITIATE_LINK_SPEED_CHANGE_MASK;
diff --git a/drivers/gpu/drm/amd/amdgpu/si.c b/drivers/gpu/drm/amd/amdgpu/si.c
index 4b81f29e5fd5..8ea60fdd1b1d 100644
--- a/drivers/gpu/drm/amd/amdgpu/si.c
+++ b/drivers/gpu/drm/amd/amdgpu/si.c
@@ -2331,28 +2331,18 @@ static void si_pcie_gen3_enable(struct amdgpu_device *adev)
 								   gpu_cfg &
 								   PCI_EXP_LNKCTL_HAWD);
 
-				pcie_capability_read_word(root, PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (bridge_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(root,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
-
-				pcie_capability_read_word(adev->pdev,
-							  PCI_EXP_LNKCTL2,
-							  &tmp16);
-				tmp16 &= ~(PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN);
-				tmp16 |= (gpu_cfg2 &
-					  (PCI_EXP_LNKCTL2_ENTER_COMP |
-					   PCI_EXP_LNKCTL2_TX_MARGIN));
-				pcie_capability_write_word(adev->pdev,
-							   PCI_EXP_LNKCTL2,
-							   tmp16);
+				pcie_capability_clear_and_set_word(root, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   bridge_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
+				pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+								   PCI_EXP_LNKCTL2_ENTER_COMP |
+								   PCI_EXP_LNKCTL2_TX_MARGIN,
+								   gpu_cfg2 &
+								   (PCI_EXP_LNKCTL2_ENTER_COMP |
+								    PCI_EXP_LNKCTL2_TX_MARGIN));
 
 				tmp = RREG32_PCIE_PORT(PCIE_LC_CNTL4);
 				tmp &= ~LC_SET_QUIESCE;
@@ -2365,16 +2355,15 @@ static void si_pcie_gen3_enable(struct amdgpu_device *adev)
 	speed_cntl &= ~LC_FORCE_DIS_SW_SPEED_CHANGE;
 	WREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL, speed_cntl);
 
-	pcie_capability_read_word(adev->pdev, PCI_EXP_LNKCTL2, &tmp16);
-	tmp16 &= ~PCI_EXP_LNKCTL2_TLS;
-
+	tmp16 = 0;
 	if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN3)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_8_0GT; /* gen3 */
 	else if (adev->pm.pcie_gen_mask & CAIL_PCIE_LINK_SPEED_SUPPORT_GEN2)
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_5_0GT; /* gen2 */
 	else
 		tmp16 |= PCI_EXP_LNKCTL2_TLS_2_5GT; /* gen1 */
-	pcie_capability_write_word(adev->pdev, PCI_EXP_LNKCTL2, tmp16);
+	pcie_capability_clear_and_set_word(adev->pdev, PCI_EXP_LNKCTL2,
+					   PCI_EXP_LNKCTL2_TLS, tmp16);
 
 	speed_cntl = RREG32_PCIE_PORT(PCIE_LC_SPEED_CNTL);
 	speed_cntl |= LC_INITIATE_LINK_SPEED_CHANGE;
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 04/10] drm/IB/hfi1: Use RMW accessors for changing LNKCTL2
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
                   ` (2 preceding siblings ...)
  2023-08-17 12:17   ` Ilpo Järvinen
@ 2023-08-17 12:17 ` Ilpo Järvinen
  2023-08-17 12:17 ` [PATCH 05/10] PCI: Store all PCIe Supported Link Speeds Ilpo Järvinen
                   ` (6 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Dennis Dalessandro, Jason Gunthorpe, Leon Romanovsky, linux-rdma,
	linux-kernel
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher,
	Ilpo Järvinen

Don't assume that only the driver would be accessing LNKCTL2. In the
case of upstream (parent), the driver does not even own the device it's
changing the registers for.

Use RMW capability accessors which do proper locking to avoid losing
concurrent updates to the register value. This change is also useful as
a cleanup.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/infiniband/hw/hfi1/pcie.c | 30 ++++++++----------------------
 1 file changed, 8 insertions(+), 22 deletions(-)

diff --git a/drivers/infiniband/hw/hfi1/pcie.c b/drivers/infiniband/hw/hfi1/pcie.c
index 08732e1ac966..60a177f52eb5 100644
--- a/drivers/infiniband/hw/hfi1/pcie.c
+++ b/drivers/infiniband/hw/hfi1/pcie.c
@@ -1212,14 +1212,11 @@ int do_pcie_gen3_transition(struct hfi1_devdata *dd)
 		    (u32)lnkctl2);
 	/* only write to parent if target is not as high as ours */
 	if ((lnkctl2 & PCI_EXP_LNKCTL2_TLS) < target_vector) {
-		lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
-		lnkctl2 |= target_vector;
-		dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
-			    (u32)lnkctl2);
-		ret = pcie_capability_write_word(parent,
-						 PCI_EXP_LNKCTL2, lnkctl2);
+		ret = pcie_capability_clear_and_set_word(parent, PCI_EXP_LNKCTL2,
+							 PCI_EXP_LNKCTL2_TLS,
+							 target_vector);
 		if (ret) {
-			dd_dev_err(dd, "Unable to write to PCI config\n");
+			dd_dev_err(dd, "Unable to change PCI target speed\n");
 			return_error = 1;
 			goto done;
 		}
@@ -1228,22 +1225,11 @@ int do_pcie_gen3_transition(struct hfi1_devdata *dd)
 	}
 
 	dd_dev_info(dd, "%s: setting target link speed\n", __func__);
-	ret = pcie_capability_read_word(dd->pcidev, PCI_EXP_LNKCTL2, &lnkctl2);
+	ret = pcie_capability_clear_and_set_word(dd->pcidev, PCI_EXP_LNKCTL2,
+						 PCI_EXP_LNKCTL2_TLS,
+						 target_vector);
 	if (ret) {
-		dd_dev_err(dd, "Unable to read from PCI config\n");
-		return_error = 1;
-		goto done;
-	}
-
-	dd_dev_info(dd, "%s: ..old link control2: 0x%x\n", __func__,
-		    (u32)lnkctl2);
-	lnkctl2 &= ~PCI_EXP_LNKCTL2_TLS;
-	lnkctl2 |= target_vector;
-	dd_dev_info(dd, "%s: ..new link control2: 0x%x\n", __func__,
-		    (u32)lnkctl2);
-	ret = pcie_capability_write_word(dd->pcidev, PCI_EXP_LNKCTL2, lnkctl2);
-	if (ret) {
-		dd_dev_err(dd, "Unable to write to PCI config\n");
+		dd_dev_err(dd, "Unable to change PCI target speed\n");
 		return_error = 1;
 		goto done;
 	}
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 05/10] PCI: Store all PCIe Supported Link Speeds
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
                   ` (3 preceding siblings ...)
  2023-08-17 12:17 ` [PATCH 04/10] drm/IB/hfi1: " Ilpo Järvinen
@ 2023-08-17 12:17 ` Ilpo Järvinen
  2023-08-17 12:17 ` [PATCH 06/10] PCI: Cache PCIe device's Supported Speed Vector Ilpo Järvinen
                   ` (5 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Bjorn Helgaas, linux-kernel
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher,
	Ilpo Järvinen

struct pci_bus stores max_bus_speed. Implementation Note in PCIe r6.0.1
sec 7.5.3.18, however, recommends determining supported Link Speeds
using the Supported Link Speeds Vector in the Link Capabilities 2
Register (when available).

Add pcie_bus_speeds into struct pci_bus which caches the Supported Link
Speeds. The value is taken directly from the Supported Link Speeds
Vector or synthetized from the Max Link Speed in the Link Capabilities
Register when the Link Capabilities 2 Register is not available.

pcie_bus_speeds field keeps the extra reserved zero at the least
significant bit to match the Link Capabilities 2 Register layouting.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/probe.c           | 28 +++++++++++++++++++++++++++-
 include/linux/pci.h           |  1 +
 include/uapi/linux/pci_regs.h |  1 +
 3 files changed, 29 insertions(+), 1 deletion(-)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index f1587fb0ba71..586d44b5ed7a 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -768,6 +768,29 @@ static enum pci_bus_speed agp_speed(int agp3, int agpstat)
 	return agp_speeds[index];
 }
 
+/*
+ * Implementation Note in PCIe r6.0.1 sec 7.5.3.18 recommends determining
+ * supported link speeds using the Supported Link Speeds Vector in the Link
+ * Capabilities 2 Register (when available).
+ */
+static u8 pcie_get_supported_speeds(u32 linkcap, u32 linkcap2)
+{
+	u8 speeds;
+
+	speeds = linkcap2 & PCI_EXP_LNKCAP2_SLS;
+	if (speeds)
+		return speeds;
+
+	/*
+	 * Synthetize supported link speeds from the Max Link Speed in the
+	 * Link Capabilities Register.
+	 */
+	speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
+	if ((linkcap & PCI_EXP_LNKCAP_SLS) == PCI_EXP_LNKCAP_SLS_5_0GB)
+		speeds |= PCI_EXP_LNKCAP2_SLS_5_0GB;
+	return speeds;
+}
+
 static void pci_set_bus_speed(struct pci_bus *bus)
 {
 	struct pci_dev *bridge = bus->self;
@@ -815,12 +838,15 @@ static void pci_set_bus_speed(struct pci_bus *bus)
 	}
 
 	if (pci_is_pcie(bridge)) {
-		u32 linkcap;
+		u32 linkcap, linkcap2;
 		u16 linksta;
 
 		pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP, &linkcap);
 		bus->max_bus_speed = pcie_link_speed[linkcap & PCI_EXP_LNKCAP_SLS];
 
+		pcie_capability_read_dword(bridge, PCI_EXP_LNKCAP2, &linkcap2);
+		bus->pcie_bus_speeds = pcie_get_supported_speeds(linkcap, linkcap2);
+
 		pcie_capability_read_word(bridge, PCI_EXP_LNKSTA, &linksta);
 		pcie_update_link_speed(bus, linksta);
 	}
diff --git a/include/linux/pci.h b/include/linux/pci.h
index 7b2927a90ee0..d14d92bb7eba 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -664,6 +664,7 @@ struct pci_bus {
 	unsigned char	primary;	/* Number of primary bridge */
 	unsigned char	max_bus_speed;	/* enum pci_bus_speed */
 	unsigned char	cur_bus_speed;	/* enum pci_bus_speed */
+	u8		pcie_bus_speeds;/* Supported Link Speeds Vector (+ reserved 0 at LSB) */
 #ifdef CONFIG_PCI_DOMAINS_GENERIC
 	int		domain_nr;
 #endif
diff --git a/include/uapi/linux/pci_regs.h b/include/uapi/linux/pci_regs.h
index e5f558d96493..2b27e4f6854a 100644
--- a/include/uapi/linux/pci_regs.h
+++ b/include/uapi/linux/pci_regs.h
@@ -674,6 +674,7 @@
 #define PCI_EXP_DEVSTA2		0x2a	/* Device Status 2 */
 #define PCI_CAP_EXP_RC_ENDPOINT_SIZEOF_V2 0x2c	/* end of v2 EPs w/o link */
 #define PCI_EXP_LNKCAP2		0x2c	/* Link Capabilities 2 */
+#define  PCI_EXP_LNKCAP2_SLS		0x000000fe /* Supported Link Speeds Vector */
 #define  PCI_EXP_LNKCAP2_SLS_2_5GB	0x00000002 /* Supported Speed 2.5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_5_0GB	0x00000004 /* Supported Speed 5GT/s */
 #define  PCI_EXP_LNKCAP2_SLS_8_0GB	0x00000008 /* Supported Speed 8GT/s */
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 06/10] PCI: Cache PCIe device's Supported Speed Vector
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
                   ` (4 preceding siblings ...)
  2023-08-17 12:17 ` [PATCH 05/10] PCI: Store all PCIe Supported Link Speeds Ilpo Järvinen
@ 2023-08-17 12:17 ` Ilpo Järvinen
  2023-08-17 12:17 ` [PATCH 07/10] PCI/LINK: Re-add BW notification portdrv as PCIe BW controller Ilpo Järvinen
                   ` (4 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Bjorn Helgaas, linux-kernel
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher,
	Ilpo Järvinen

The Supported Link Speeds Vector in the Link Capabilities Register 2
corresponds to the bus below on Root Ports and Downstream Ports,
whereas it corresponds to the bus above on Upstream Ports and
Endpoints. Only the former is currently cached in pcie_bus_speeds in
the struct pci_bus. The link speeds that are supported is the
intersection of these two.

Store the device's Supported Link Speeds Vector into the struct pci_bus
when the Function 0 is enumerated (the Multi-Function Devices must have
same speeds the same for all Functions) to be easily able to calculate
the intersection of Supported Link Speeds.

Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/probe.c  | 10 ++++++++++
 drivers/pci/remove.c |  2 ++
 include/linux/pci.h  |  1 +
 3 files changed, 13 insertions(+)

diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 586d44b5ed7a..56d32297deb0 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2564,6 +2564,7 @@ static void pci_set_msi_domain(struct pci_dev *dev)
 
 void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 {
+	u8 dev_speeds = 0;
 	int ret;
 
 	pci_configure_device(dev);
@@ -2590,11 +2591,20 @@ void pci_device_add(struct pci_dev *dev, struct pci_bus *bus)
 
 	pci_init_capabilities(dev);
 
+	if (pci_is_pcie(dev) && PCI_FUNC(dev->devfn) == 0) {
+		u32 linkcap, linkcap2;
+
+		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
+		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP2, &linkcap2);
+		dev_speeds = pcie_get_supported_speeds(linkcap, linkcap2);
+	}
 	/*
 	 * Add the device to our list of discovered devices
 	 * and the bus list for fixup functions, etc.
 	 */
 	down_write(&pci_bus_sem);
+	if (dev_speeds)
+		bus->pcie_dev_speeds = dev_speeds;
 	list_add_tail(&dev->bus_list, &bus->devices);
 	up_write(&pci_bus_sem);
 
diff --git a/drivers/pci/remove.c b/drivers/pci/remove.c
index d68aee29386b..7e80831467a8 100644
--- a/drivers/pci/remove.c
+++ b/drivers/pci/remove.c
@@ -35,6 +35,8 @@ static void pci_destroy_dev(struct pci_dev *dev)
 	device_del(&dev->dev);
 
 	down_write(&pci_bus_sem);
+	if (pci_is_pcie(dev) && PCI_FUNC(dev->devfn) == 0)
+		dev->bus->pcie_dev_speeds = 0;
 	list_del(&dev->bus_list);
 	up_write(&pci_bus_sem);
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index d14d92bb7eba..8bfc8895f671 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -665,6 +665,7 @@ struct pci_bus {
 	unsigned char	max_bus_speed;	/* enum pci_bus_speed */
 	unsigned char	cur_bus_speed;	/* enum pci_bus_speed */
 	u8		pcie_bus_speeds;/* Supported Link Speeds Vector (+ reserved 0 at LSB) */
+	u8		pcie_dev_speeds;/* Device's Supported Link Speeds Vector (+ 0 at LSB) */
 #ifdef CONFIG_PCI_DOMAINS_GENERIC
 	int		domain_nr;
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 07/10] PCI/LINK: Re-add BW notification portdrv as PCIe BW controller
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
                   ` (5 preceding siblings ...)
  2023-08-17 12:17 ` [PATCH 06/10] PCI: Cache PCIe device's Supported Speed Vector Ilpo Järvinen
@ 2023-08-17 12:17 ` Ilpo Järvinen
  2023-08-17 12:17 ` [PATCH 08/10] PCI/bwctrl: Add "controller" part into PCIe bwctrl Ilpo Järvinen
                   ` (3 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Bjorn Helgaas, Ilpo Järvinen, linux-kernel
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher

This mostly reverts b4c7d2076b4e ("PCI/LINK: Remove bandwidth
notification"), however, there are small tweaks:

1) Call it PCIe bwctrl (bandwidth controller) instead of just
   bandwidth notifications.
2) Don't print the notifications into kernel log, just keep the current
   link speed updated.
3) Use concurrency safe LNKCTL RMW operations.
4) Read link speed after enabling the notification to ensure the
   current link speed is correct from the start.
5) Add local variable in probe for srv->port.
6) Handle link speed read and LBMS write race in
   pcie_bw_notification_irq().

The reason for 1) is to indicate the increased scope of the driver. A
subsequent commit extends the driver to allow controlling PCIe
bandwidths from user space upon crossing thermal thresholds.

While 2) is somewhat unfortunate, the log spam was the source of
complaints that eventually lead to the removal of the bandwidth
notifications driver (see the links below for further information).
After re-adding this driver back the userspace can, if it wishes to,
observe the link speed changes using the current bus speed files under
sysfs.

Link: https://lore.kernel.org/all/20190429185611.121751-1-helgaas@kernel.org/
Link: https://lore.kernel.org/linux-pci/20190501142942.26972-1-keith.busch@intel.com/
Link: https://lore.kernel.org/linux-pci/20200115221008.GA191037@google.com/
Suggested-by: Lukas Wunner <lukas@wunner.de>
Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 drivers/pci/pcie/Kconfig   |   8 +++
 drivers/pci/pcie/Makefile  |   1 +
 drivers/pci/pcie/bwctrl.c  | 131 +++++++++++++++++++++++++++++++++++++
 drivers/pci/pcie/portdrv.c |   9 +--
 drivers/pci/pcie/portdrv.h |  10 ++-
 5 files changed, 153 insertions(+), 6 deletions(-)
 create mode 100644 drivers/pci/pcie/bwctrl.c

diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 228652a59f27..426dd81b48e4 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -137,6 +137,14 @@ config PCIE_PTM
 	  This is only useful if you have devices that support PTM, but it
 	  is safe to enable even if you don't.
 
+config PCIE_BW
+	bool "PCI Express Bandwidth Change Notification"
+	depends on PCIEPORTBUS
+	help
+	  This enables PCI Express Bandwidth Change Notification.  If
+	  you know link width or rate changes occur only to correct
+	  unreliable links, you may answer Y.
+
 config PCIE_EDR
 	bool "PCI Express Error Disconnect Recover support"
 	depends on PCIE_DPC && ACPI
diff --git a/drivers/pci/pcie/Makefile b/drivers/pci/pcie/Makefile
index 8de4ed5f98f1..175065a495cf 100644
--- a/drivers/pci/pcie/Makefile
+++ b/drivers/pci/pcie/Makefile
@@ -12,4 +12,5 @@ obj-$(CONFIG_PCIEAER_INJECT)	+= aer_inject.o
 obj-$(CONFIG_PCIE_PME)		+= pme.o
 obj-$(CONFIG_PCIE_DPC)		+= dpc.o
 obj-$(CONFIG_PCIE_PTM)		+= ptm.o
+obj-$(CONFIG_PCIE_BW)		+= bwctrl.o
 obj-$(CONFIG_PCIE_EDR)		+= edr.o
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
new file mode 100644
index 000000000000..4fc6718fc0e5
--- /dev/null
+++ b/drivers/pci/pcie/bwctrl.c
@@ -0,0 +1,131 @@
+// SPDX-License-Identifier: GPL-2.0+
+/*
+ * PCI Express Link Bandwidth Notification services driver
+ * Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
+ *
+ * Copyright (C) 2019, Dell Inc
+ *
+ * The PCIe Link Bandwidth Notification provides a way to notify the
+ * operating system when the link width or data rate changes.  This
+ * capability is required for all root ports and downstream ports
+ * supporting links wider than x1 and/or multiple link speeds.
+ *
+ * This service port driver hooks into the bandwidth notification interrupt
+ * watching for link speed changes or links becoming degraded in operation
+ * and updates the cached link speed exposed to user space.
+ */
+
+#define dev_fmt(fmt) "bwctrl: " fmt
+
+#include "../pci.h"
+#include "portdrv.h"
+
+static bool pcie_link_bandwidth_notification_supported(struct pci_dev *dev)
+{
+	int ret;
+	u32 lnk_cap;
+
+	ret = pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &lnk_cap);
+	return (ret == PCIBIOS_SUCCESSFUL) && (lnk_cap & PCI_EXP_LNKCAP_LBNC);
+}
+
+static void pcie_enable_link_bandwidth_notification(struct pci_dev *dev)
+{
+	u16 link_status;
+	int ret;
+
+	pcie_capability_write_word(dev, PCI_EXP_LNKSTA, PCI_EXP_LNKSTA_LBMS);
+	pcie_capability_set_word(dev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LBMIE);
+
+	/* Read after enabling notifications to ensure link speed is up to date */
+	ret = pcie_capability_read_word(dev, PCI_EXP_LNKSTA, &link_status);
+	if (ret == PCIBIOS_SUCCESSFUL)
+		pcie_update_link_speed(dev->subordinate, link_status);
+}
+
+static void pcie_disable_link_bandwidth_notification(struct pci_dev *dev)
+{
+	pcie_capability_clear_word(dev, PCI_EXP_LNKCTL, PCI_EXP_LNKCTL_LBMIE);
+}
+
+static irqreturn_t pcie_bw_notification_irq(int irq, void *context)
+{
+	struct pcie_device *srv = context;
+	struct pci_dev *port = srv->port;
+	u16 link_status, events;
+	int ret;
+
+	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
+	events = link_status & PCI_EXP_LNKSTA_LBMS;
+
+	if (ret != PCIBIOS_SUCCESSFUL || !events)
+		return IRQ_NONE;
+
+	pcie_capability_write_word(port, PCI_EXP_LNKSTA, events);
+
+	/*
+	 * The write to clear LBMS prevents getting interrupt from the
+	 * latest link speed when the link speed changes between the above
+	 * LNKSTA read and write. Therefore, re-read the speed before
+	 * updating it.
+	 */
+	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
+	if (ret != PCIBIOS_SUCCESSFUL)
+		return IRQ_HANDLED;
+	pcie_update_link_speed(port->subordinate, link_status);
+
+	return IRQ_HANDLED;
+}
+
+static int pcie_bandwidth_notification_probe(struct pcie_device *srv)
+{
+	struct pci_dev *port = srv->port;
+	int ret;
+
+	/* Single-width or single-speed ports do not have to support this. */
+	if (!pcie_link_bandwidth_notification_supported(port))
+		return -ENODEV;
+
+	ret = request_irq(srv->irq, pcie_bw_notification_irq,
+			  IRQF_SHARED, "PCIe BW ctrl", srv);
+	if (ret)
+		return ret;
+
+	pcie_enable_link_bandwidth_notification(port);
+	pci_info(port, "enabled with IRQ %d\n", srv->irq);
+
+	return 0;
+}
+
+static void pcie_bandwidth_notification_remove(struct pcie_device *srv)
+{
+	pcie_disable_link_bandwidth_notification(srv->port);
+	free_irq(srv->irq, srv);
+}
+
+static int pcie_bandwidth_notification_suspend(struct pcie_device *srv)
+{
+	pcie_disable_link_bandwidth_notification(srv->port);
+	return 0;
+}
+
+static int pcie_bandwidth_notification_resume(struct pcie_device *srv)
+{
+	pcie_enable_link_bandwidth_notification(srv->port);
+	return 0;
+}
+
+static struct pcie_port_service_driver pcie_bandwidth_notification_driver = {
+	.name		= "pcie_bwctrl",
+	.port_type	= PCIE_ANY_PORT,
+	.service	= PCIE_PORT_SERVICE_BWCTRL,
+	.probe		= pcie_bandwidth_notification_probe,
+	.suspend	= pcie_bandwidth_notification_suspend,
+	.resume		= pcie_bandwidth_notification_resume,
+	.remove		= pcie_bandwidth_notification_remove,
+};
+
+int __init pcie_bwctrl_init(void)
+{
+	return pcie_port_service_register(&pcie_bandwidth_notification_driver);
+}
diff --git a/drivers/pci/pcie/portdrv.c b/drivers/pci/pcie/portdrv.c
index 46fad0d813b2..ed33049bffd6 100644
--- a/drivers/pci/pcie/portdrv.c
+++ b/drivers/pci/pcie/portdrv.c
@@ -67,7 +67,7 @@ static int pcie_message_numbers(struct pci_dev *dev, int mask,
 	 */
 
 	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
-		    PCIE_PORT_SERVICE_BWNOTIF)) {
+		    PCIE_PORT_SERVICE_BWCTRL)) {
 		pcie_capability_read_word(dev, PCI_EXP_FLAGS, &reg16);
 		*pme = (reg16 & PCI_EXP_FLAGS_IRQ) >> 9;
 		nvec = *pme + 1;
@@ -149,11 +149,11 @@ static int pcie_port_enable_irq_vec(struct pci_dev *dev, int *irqs, int mask)
 
 	/* PME, hotplug and bandwidth notification share an MSI/MSI-X vector */
 	if (mask & (PCIE_PORT_SERVICE_PME | PCIE_PORT_SERVICE_HP |
-		    PCIE_PORT_SERVICE_BWNOTIF)) {
+		    PCIE_PORT_SERVICE_BWCTRL)) {
 		pcie_irq = pci_irq_vector(dev, pme);
 		irqs[PCIE_PORT_SERVICE_PME_SHIFT] = pcie_irq;
 		irqs[PCIE_PORT_SERVICE_HP_SHIFT] = pcie_irq;
-		irqs[PCIE_PORT_SERVICE_BWNOTIF_SHIFT] = pcie_irq;
+		irqs[PCIE_PORT_SERVICE_BWCTRL_SHIFT] = pcie_irq;
 	}
 
 	if (mask & PCIE_PORT_SERVICE_AER)
@@ -270,7 +270,7 @@ static int get_port_device_capability(struct pci_dev *dev)
 
 		pcie_capability_read_dword(dev, PCI_EXP_LNKCAP, &linkcap);
 		if (linkcap & PCI_EXP_LNKCAP_LBNC)
-			services |= PCIE_PORT_SERVICE_BWNOTIF;
+			services |= PCIE_PORT_SERVICE_BWCTRL;
 	}
 
 	return services;
@@ -828,6 +828,7 @@ static void __init pcie_init_services(void)
 	pcie_pme_init();
 	pcie_dpc_init();
 	pcie_hp_init();
+	pcie_bwctrl_init();
 }
 
 static int __init pcie_portdrv_init(void)
diff --git a/drivers/pci/pcie/portdrv.h b/drivers/pci/pcie/portdrv.h
index 58a2b1a1cae4..f622c8a02a5b 100644
--- a/drivers/pci/pcie/portdrv.h
+++ b/drivers/pci/pcie/portdrv.h
@@ -20,8 +20,8 @@
 #define PCIE_PORT_SERVICE_HP		(1 << PCIE_PORT_SERVICE_HP_SHIFT)
 #define PCIE_PORT_SERVICE_DPC_SHIFT	3	/* Downstream Port Containment */
 #define PCIE_PORT_SERVICE_DPC		(1 << PCIE_PORT_SERVICE_DPC_SHIFT)
-#define PCIE_PORT_SERVICE_BWNOTIF_SHIFT	4	/* Bandwidth notification */
-#define PCIE_PORT_SERVICE_BWNOTIF	(1 << PCIE_PORT_SERVICE_BWNOTIF_SHIFT)
+#define PCIE_PORT_SERVICE_BWCTRL_SHIFT	4	/* Bandwidth Controller (notifications) */
+#define PCIE_PORT_SERVICE_BWCTRL	(1 << PCIE_PORT_SERVICE_BWCTRL_SHIFT)
 
 #define PCIE_PORT_DEVICE_MAXSERVICES   5
 
@@ -53,6 +53,12 @@ int pcie_dpc_init(void);
 static inline int pcie_dpc_init(void) { return 0; }
 #endif
 
+#ifdef CONFIG_PCIE_BW
+int pcie_bwctrl_init(void);
+#else
+static inline int pcie_bwctrl_init(void) { return 0; }
+#endif
+
 /* Port Type */
 #define PCIE_ANY_PORT			(~0)
 
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 08/10] PCI/bwctrl: Add "controller" part into PCIe bwctrl
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
                   ` (6 preceding siblings ...)
  2023-08-17 12:17 ` [PATCH 07/10] PCI/LINK: Re-add BW notification portdrv as PCIe BW controller Ilpo Järvinen
@ 2023-08-17 12:17 ` Ilpo Järvinen
  2023-08-17 12:17 ` [PATCH 09/10] thermal: Add PCIe cooling driver Ilpo Järvinen
                   ` (2 subsequent siblings)
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Bjorn Helgaas, Ilpo Järvinen, linux-kernel
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher

Add "controller" parts into PCIe bwctrl for limiting PCIe Link Speed
(due to thermal reasons).

PCIe bandwidth controller introduces an in-kernel API to set PCIe Link
Speed. This new API is intended to be used in an upcoming commit that
adds a thermal cooling device to throttle PCIe bandwidth when thermal
thresholds are reached. No users are introduced in this commit yet.

The PCIe bandwidth control procedure is as follows. The requested speed
is validated against Link Speeds supported by the port and downstream
device. Then bandwidth controller sets the Target Link Speed in the
Link Control 2 Register and retrains the PCIe Link.

Bandwidth notifications enable the cur_bus_speed in the struct pci_bus
to keep track PCIe Link Speed changes. This keeps the link speed seen
through sysfs correct (both for PCI device and thermal cooling device).
While bandwidth notifications should also be generated when bandwidth
controller alters the PCIe Link Speed, a few platforms do not deliver
LMBS interrupt after Link Training as expected. Thus, after changing
the Link Speed, bandwidth controller makes additional read for the Link
Status Register to ensure cur_bus_speed is consistent with the new PCIe
Link Speed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 MAINTAINERS                |   6 ++
 drivers/pci/pcie/Kconfig   |   9 +-
 drivers/pci/pcie/bwctrl.c  | 177 +++++++++++++++++++++++++++++++++++--
 include/linux/pci-bwctrl.h |  18 ++++
 4 files changed, 201 insertions(+), 9 deletions(-)
 create mode 100644 include/linux/pci-bwctrl.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 3be1bdfe8ecc..d2eed2883a43 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16417,6 +16417,12 @@ F:	include/linux/pci*
 F:	include/uapi/linux/pci*
 F:	lib/pci*
 
+PCIE BANDWIDTH CONTROLLER
+M:	Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
+S:	Supported
+F:	drivers/pci/pcie/bwctrl.c
+F:	include/linux/pci-bwctrl.h
+
 PCIE DRIVER FOR AMAZON ANNAPURNA LABS
 M:	Jonathan Chocron <jonnyc@amazon.com>
 L:	linux-pci@vger.kernel.org
diff --git a/drivers/pci/pcie/Kconfig b/drivers/pci/pcie/Kconfig
index 426dd81b48e4..dd30efb8299a 100644
--- a/drivers/pci/pcie/Kconfig
+++ b/drivers/pci/pcie/Kconfig
@@ -138,12 +138,13 @@ config PCIE_PTM
 	  is safe to enable even if you don't.
 
 config PCIE_BW
-	bool "PCI Express Bandwidth Change Notification"
+	bool "PCI Express Bandwidth Controller"
 	depends on PCIEPORTBUS
 	help
-	  This enables PCI Express Bandwidth Change Notification.  If
-	  you know link width or rate changes occur only to correct
-	  unreliable links, you may answer Y.
+	  This enables PCI Express Bandwidth Controller. The Bandwidth
+	  Controller allows controlling PCIe link speed and listens for link
+	  speed Change Notifications. If you know link width or rate changes
+	  occur only to correct unreliable links, you may answer Y.
 
 config PCIE_EDR
 	bool "PCI Express Error Disconnect Recover support"
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index 4fc6718fc0e5..e3172d69476f 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -1,14 +1,16 @@
 // SPDX-License-Identifier: GPL-2.0+
 /*
- * PCI Express Link Bandwidth Notification services driver
+ * PCIe bandwidth controller
+ *
  * Author: Alexandru Gagniuc <mr.nuke.me@gmail.com>
  *
  * Copyright (C) 2019, Dell Inc
+ * Copyright (C) 2023 Intel Corporation.
  *
- * The PCIe Link Bandwidth Notification provides a way to notify the
- * operating system when the link width or data rate changes.  This
- * capability is required for all root ports and downstream ports
- * supporting links wider than x1 and/or multiple link speeds.
+ * The PCIe Bandwidth Controller provides a way to alter PCIe link speeds
+ * and notify the operating system when the link width or data rate changes.
+ * The notification capability is required for all Root Ports and Downstream
+ * Ports supporting links wider than x1 and/or multiple link speeds.
  *
  * This service port driver hooks into the bandwidth notification interrupt
  * watching for link speed changes or links becoming degraded in operation
@@ -17,9 +19,48 @@
 
 #define dev_fmt(fmt) "bwctrl: " fmt
 
+#include <linux/bitops.h>
+#include <linux/errno.h>
+#include <linux/mutex.h>
+#include <linux/pci.h>
+#include <linux/pci-bwctrl.h>
+#include <linux/types.h>
+
+#include <asm/rwonce.h>
+
 #include "../pci.h"
 #include "portdrv.h"
 
+/**
+ * struct bwctrl_service_data - PCIe Port Bandwidth Controller
+ * @set_speed_mutex: serializes link speed changes
+ */
+struct bwctrl_service_data {
+	struct mutex set_speed_mutex;
+};
+
+static bool bwctrl_valid_pcie_speed(enum pci_bus_speed speed)
+{
+	return (speed >= PCIE_SPEED_2_5GT) && (speed <= PCIE_SPEED_64_0GT);
+}
+
+static u16 speed2lnkctl2(enum pci_bus_speed speed)
+{
+	static const u16 speed_conv[] = {
+		[PCIE_SPEED_2_5GT] = PCI_EXP_LNKCTL2_TLS_2_5GT,
+		[PCIE_SPEED_5_0GT] = PCI_EXP_LNKCTL2_TLS_5_0GT,
+		[PCIE_SPEED_8_0GT] = PCI_EXP_LNKCTL2_TLS_8_0GT,
+		[PCIE_SPEED_16_0GT] = PCI_EXP_LNKCTL2_TLS_16_0GT,
+		[PCIE_SPEED_32_0GT] = PCI_EXP_LNKCTL2_TLS_32_0GT,
+		[PCIE_SPEED_64_0GT] = PCI_EXP_LNKCTL2_TLS_64_0GT,
+	};
+
+	if (WARN_ON_ONCE(!bwctrl_valid_pcie_speed(speed)))
+		return 0;
+
+	return speed_conv[speed];
+}
+
 static bool pcie_link_bandwidth_notification_supported(struct pci_dev *dev)
 {
 	int ret;
@@ -77,8 +118,118 @@ static irqreturn_t pcie_bw_notification_irq(int irq, void *context)
 	return IRQ_HANDLED;
 }
 
+/* Configure target speed to the requested speed and set train link */
+static int bwctrl_set_speed(struct pci_dev *port, u16 lnkctl2_speed)
+{
+	int ret;
+
+	ret = pcie_capability_clear_and_set_word(port, PCI_EXP_LNKCTL2,
+						 PCI_EXP_LNKCTL2_TLS, lnkctl2_speed);
+	if (ret != PCIBIOS_SUCCESSFUL)
+		return pcibios_err_to_errno(ret);
+
+	return 0;
+}
+
+static int bwctrl_select_speed(struct pcie_device *srv, enum pci_bus_speed *speed)
+{
+	struct pci_bus *bus = srv->port->subordinate;
+	u8 speeds, dev_speeds;
+	int i;
+
+	if (*speed > PCIE_LNKCAP2_SLS2SPEED(bus->pcie_bus_speeds))
+		return -EINVAL;
+
+	dev_speeds = READ_ONCE(bus->pcie_dev_speeds);
+	/* Only the lowest speed can be set when there are no devices */
+	if (!dev_speeds)
+		dev_speeds = PCI_EXP_LNKCAP2_SLS_2_5GB;
+
+	/*
+	 * Implementation Note in PCIe r6.0.1 sec 7.5.3.18 recommends OS to
+	 * utilize Supported Link Speeds vector for determining which link
+	 * speeds are supported.
+	 *
+	 * Take into account Supported Link Speeds both from the Root Port
+	 * and the device.
+	 */
+	speeds = bus->pcie_bus_speeds & dev_speeds;
+	i = BIT(fls(speeds));
+	while (i >= PCI_EXP_LNKCAP2_SLS_2_5GB) {
+		enum pci_bus_speed candidate;
+
+		if (speeds & i) {
+			candidate = PCIE_LNKCAP2_SLS2SPEED(i);
+			if (candidate <= *speed) {
+				*speed = candidate;
+				return 0;
+			}
+		}
+		i >>= 1;
+	}
+
+	return -EINVAL;
+}
+
+/**
+ * bwctrl_set_current_speed - Set downstream link speed for PCIe port
+ * @srv: PCIe port
+ * @speed: PCIe bus speed to set
+ *
+ * Attempts to set PCIe port link speed to @speed. As long as @speed is less
+ * than the maximum of what is supported by @srv, the speed is adjusted
+ * downwards to the best speed supported by both the port and device
+ * underneath it.
+ *
+ * Return:
+ * * 0 - on success
+ * * -EINVAL - @speed is higher than the maximum @srv supports
+ * * -ETIMEDOUT - changing link speed took too long
+ * * -EAGAIN - link speed was changed but @speed was not achieved
+ */
+int bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed)
+{
+	struct bwctrl_service_data *data = get_service_data(srv);
+	struct pci_dev *port = srv->port;
+	u16 link_status;
+	int ret;
+
+	if (WARN_ON_ONCE(!bwctrl_valid_pcie_speed(speed)))
+		return -EINVAL;
+
+	ret = bwctrl_select_speed(srv, &speed);
+	if (ret < 0)
+		return ret;
+
+	mutex_lock(&data->set_speed_mutex);
+	ret = bwctrl_set_speed(port, speed2lnkctl2(speed));
+	if (ret < 0)
+		goto unlock;
+
+	ret = pcie_retrain_link(port, true);
+	if (ret < 0)
+		goto unlock;
+
+	/*
+	 * Ensure link speed updates also with platforms that have problems
+	 * with notifications
+	 */
+	ret = pcie_capability_read_word(port, PCI_EXP_LNKSTA, &link_status);
+	if (ret == PCIBIOS_SUCCESSFUL)
+		pcie_update_link_speed(port->subordinate, link_status);
+
+	if (port->subordinate->cur_bus_speed != speed)
+		ret = -EAGAIN;
+
+unlock:
+	mutex_unlock(&data->set_speed_mutex);
+
+	return ret;
+}
+
 static int pcie_bandwidth_notification_probe(struct pcie_device *srv)
 {
+	struct bwctrl_service_data *data;
 	struct pci_dev *port = srv->port;
 	int ret;
 
@@ -91,16 +242,32 @@ static int pcie_bandwidth_notification_probe(struct pcie_device *srv)
 	if (ret)
 		return ret;
 
+	data = kzalloc(sizeof(*data), GFP_KERNEL);
+	if (!data) {
+		ret = -ENOMEM;
+		goto free_irq;
+	}
+	mutex_init(&data->set_speed_mutex);
+	set_service_data(srv, data);
+
 	pcie_enable_link_bandwidth_notification(port);
 	pci_info(port, "enabled with IRQ %d\n", srv->irq);
 
 	return 0;
+
+free_irq:
+	free_irq(srv->irq, srv);
+	return ret;
 }
 
 static void pcie_bandwidth_notification_remove(struct pcie_device *srv)
 {
+	struct bwctrl_service_data *data = get_service_data(srv);
+
 	pcie_disable_link_bandwidth_notification(srv->port);
 	free_irq(srv->irq, srv);
+	mutex_destroy(&data->set_speed_mutex);
+	kfree(data);
 }
 
 static int pcie_bandwidth_notification_suspend(struct pcie_device *srv)
diff --git a/include/linux/pci-bwctrl.h b/include/linux/pci-bwctrl.h
new file mode 100644
index 000000000000..46026fa25deb
--- /dev/null
+++ b/include/linux/pci-bwctrl.h
@@ -0,0 +1,18 @@
+/* SPDX-License-Identifier: GPL-2.0-only */
+/*
+ * PCIe bandwidth controller
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ */
+
+#ifndef LINUX_PCI_BWCTRL_H
+#define LINUX_PCI_BWCTRL_H
+
+#include <linux/pci.h>
+
+struct pcie_device;
+struct thermal_cooling_device;
+
+int bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed);
+
+#endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 09/10] thermal: Add PCIe cooling driver
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
                   ` (7 preceding siblings ...)
  2023-08-17 12:17 ` [PATCH 08/10] PCI/bwctrl: Add "controller" part into PCIe bwctrl Ilpo Järvinen
@ 2023-08-17 12:17 ` Ilpo Järvinen
  2023-08-23 19:47   ` Rafael J. Wysocki
  2023-08-17 12:17 ` [PATCH 10/10] selftests/pcie_bwctrl: Create selftests Ilpo Järvinen
  2023-09-04  6:26 ` [PATCH 00/10] Add PCIe Bandwidth Controller Krishna Chaitanya Chundru
  10 siblings, 1 reply; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Ilpo Järvinen, Bjorn Helgaas, Rafael J. Wysocki,
	Daniel Lezcano, Amit Kucheria, Zhang Rui, linux-kernel, linux-pm
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher

Add a thermal cooling driver to provide path to access PCIe bandwidth
controller using the usual thermal interfaces.

A cooling device is instantiated for controllable PCIe ports from the
bwctrl service driver.

The thermal side state 0 means no throttling, i.e., maximum supported
PCIe speed.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 MAINTAINERS                    |   1 +
 drivers/pci/pcie/bwctrl.c      |  11 ++++
 drivers/thermal/Kconfig        |  10 +++
 drivers/thermal/Makefile       |   2 +
 drivers/thermal/pcie_cooling.c | 107 +++++++++++++++++++++++++++++++++
 include/linux/pci-bwctrl.h     |  15 +++++
 6 files changed, 146 insertions(+)
 create mode 100644 drivers/thermal/pcie_cooling.c

diff --git a/MAINTAINERS b/MAINTAINERS
index d2eed2883a43..a0b40253fd5a 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16421,6 +16421,7 @@ PCIE BANDWIDTH CONTROLLER
 M:	Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
 S:	Supported
 F:	drivers/pci/pcie/bwctrl.c
+F:	drivers/thermal/pcie_cooling.c
 F:	include/linux/pci-bwctrl.h
 
 PCIE DRIVER FOR AMAZON ANNAPURNA LABS
diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
index e3172d69476f..13c73546244e 100644
--- a/drivers/pci/pcie/bwctrl.c
+++ b/drivers/pci/pcie/bwctrl.c
@@ -34,9 +34,11 @@
 /**
  * struct bwctrl_service_data - PCIe Port Bandwidth Controller
  * @set_speed_mutex: serializes link speed changes
+ * @cdev: thermal cooling device associated with the port
  */
 struct bwctrl_service_data {
 	struct mutex set_speed_mutex;
+	struct thermal_cooling_device *cdev;
 };
 
 static bool bwctrl_valid_pcie_speed(enum pci_bus_speed speed)
@@ -253,8 +255,16 @@ static int pcie_bandwidth_notification_probe(struct pcie_device *srv)
 	pcie_enable_link_bandwidth_notification(port);
 	pci_info(port, "enabled with IRQ %d\n", srv->irq);
 
+	data->cdev = pcie_cooling_device_register(port, srv);
+	if (IS_ERR(data->cdev)) {
+		ret = PTR_ERR(data->cdev);
+		goto disable_notifications;
+	}
 	return 0;
 
+disable_notifications:
+	pcie_disable_link_bandwidth_notification(srv->port);
+	kfree(data);
 free_irq:
 	free_irq(srv->irq, srv);
 	return ret;
@@ -264,6 +274,7 @@ static void pcie_bandwidth_notification_remove(struct pcie_device *srv)
 {
 	struct bwctrl_service_data *data = get_service_data(srv);
 
+	pcie_cooling_device_unregister(data->cdev);
 	pcie_disable_link_bandwidth_notification(srv->port);
 	free_irq(srv->irq, srv);
 	mutex_destroy(&data->set_speed_mutex);
diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
index 19a4b33cb564..7deda3a0237d 100644
--- a/drivers/thermal/Kconfig
+++ b/drivers/thermal/Kconfig
@@ -219,6 +219,16 @@ config DEVFREQ_THERMAL
 
 	  If you want this support, you should say Y here.
 
+config PCIE_THERMAL
+	bool "PCIe cooling support"
+	depends on PCIEPORTBUS
+	select PCIE_BW
+	help
+	  This implements PCIe cooling mechanism through bandwidth reduction
+	  for PCIe devices.
+
+	  If you want this support, you should say Y here.
+
 config THERMAL_EMULATION
 	bool "Thermal emulation mode support"
 	help
diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
index 058664bc3ec0..065972a08c84 100644
--- a/drivers/thermal/Makefile
+++ b/drivers/thermal/Makefile
@@ -30,6 +30,8 @@ thermal_sys-$(CONFIG_CPU_IDLE_THERMAL)	+= cpuidle_cooling.o
 # devfreq cooling
 thermal_sys-$(CONFIG_DEVFREQ_THERMAL) += devfreq_cooling.o
 
+thermal_sys-$(CONFIG_PCIE_THERMAL) += pcie_cooling.o
+
 obj-$(CONFIG_K3_THERMAL)	+= k3_bandgap.o k3_j72xx_bandgap.o
 # platform thermal drivers
 obj-y				+= broadcom/
diff --git a/drivers/thermal/pcie_cooling.c b/drivers/thermal/pcie_cooling.c
new file mode 100644
index 000000000000..d86265c03e80
--- /dev/null
+++ b/drivers/thermal/pcie_cooling.c
@@ -0,0 +1,107 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * PCIe cooling device
+ *
+ * Copyright (C) 2023 Intel Corporation.
+ */
+
+#include <linux/build_bug.h>
+#include <linux/err.h>
+#include <linux/kernel.h>
+#include <linux/pci.h>
+#include <linux/pci-bwctrl.h>
+#include <linux/slab.h>
+#include <linux/string.h>
+#include <linux/thermal.h>
+
+#define COOLING_DEV_TYPE_PREFIX		"PCIe_Port_"
+
+struct pcie_cooling_device {
+	struct pci_dev *port;
+	struct pcie_device *pdev;
+};
+
+static int pcie_cooling_get_max_level(struct thermal_cooling_device *cdev, unsigned long *state)
+{
+	struct pcie_cooling_device *pcie_cdev = cdev->devdata;
+
+	/* cooling state 0 is same as the maximum PCIe speed */
+	*state = pcie_cdev->port->subordinate->max_bus_speed - PCIE_SPEED_2_5GT;
+
+	return 0;
+}
+
+static int pcie_cooling_get_cur_level(struct thermal_cooling_device *cdev, unsigned long *state)
+{
+	struct pcie_cooling_device *pcie_cdev = cdev->devdata;
+
+	/* cooling state 0 is same as the maximum PCIe speed */
+	*state = cdev->max_state -
+		 (pcie_cdev->port->subordinate->cur_bus_speed - PCIE_SPEED_2_5GT);
+
+	return 0;
+}
+
+static int pcie_cooling_set_cur_level(struct thermal_cooling_device *cdev, unsigned long state)
+{
+	struct pcie_cooling_device *pcie_cdev = cdev->devdata;
+	enum pci_bus_speed speed;
+
+	/* cooling state 0 is same as the maximum PCIe speed */
+	speed = (cdev->max_state - state) + PCIE_SPEED_2_5GT;
+
+	return bwctrl_set_current_speed(pcie_cdev->pdev, speed);
+}
+
+static struct thermal_cooling_device_ops pcie_cooling_ops = {
+	.get_max_state = pcie_cooling_get_max_level,
+	.get_cur_state = pcie_cooling_get_cur_level,
+	.set_cur_state = pcie_cooling_set_cur_level,
+};
+
+struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
+							    struct pcie_device *pdev)
+{
+	struct pcie_cooling_device *pcie_cdev;
+	struct thermal_cooling_device *cdev;
+	size_t name_len;
+	char *name;
+
+	pcie_cdev = kzalloc(sizeof(*pcie_cdev), GFP_KERNEL);
+	if (!pcie_cdev)
+		return ERR_PTR(-ENOMEM);
+
+	pcie_cdev->port = port;
+	pcie_cdev->pdev = pdev;
+
+	name_len = strlen(COOLING_DEV_TYPE_PREFIX) + strlen(pci_name(port)) + 1;
+	name = kzalloc(name_len, GFP_KERNEL);
+	if (!name) {
+		kfree(pcie_cdev);
+		return ERR_PTR(-ENOMEM);
+	}
+
+	snprintf(name, name_len, COOLING_DEV_TYPE_PREFIX "%s", pci_name(port));
+	cdev = thermal_cooling_device_register(name, pcie_cdev, &pcie_cooling_ops);
+	kfree(name);
+
+	return cdev;
+}
+
+void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev)
+{
+	struct pcie_cooling_device *pcie_cdev = cdev->devdata;
+
+	thermal_cooling_device_unregister(cdev);
+	kfree(pcie_cdev);
+}
+
+/* For bus_speed <-> state arithmetic */
+static_assert(PCIE_SPEED_2_5GT + 1 == PCIE_SPEED_5_0GT);
+static_assert(PCIE_SPEED_5_0GT + 1 == PCIE_SPEED_8_0GT);
+static_assert(PCIE_SPEED_8_0GT + 1 == PCIE_SPEED_16_0GT);
+static_assert(PCIE_SPEED_16_0GT + 1 == PCIE_SPEED_32_0GT);
+static_assert(PCIE_SPEED_32_0GT + 1 == PCIE_SPEED_64_0GT);
+
+MODULE_AUTHOR("Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>");
+MODULE_DESCRIPTION("PCIe cooling driver");
diff --git a/include/linux/pci-bwctrl.h b/include/linux/pci-bwctrl.h
index 46026fa25deb..366445517b72 100644
--- a/include/linux/pci-bwctrl.h
+++ b/include/linux/pci-bwctrl.h
@@ -15,4 +15,19 @@ struct thermal_cooling_device;
 
 int bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed);
 
+#ifdef CONFIG_PCIE_THERMAL
+struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
+							    struct pcie_device *pdev);
+void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev);
+#else
+static inline struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
+									  struct pcie_device *pdev)
+{
+	return NULL;
+}
+static inline void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev)
+{
+}
+#endif
+
 #endif
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* [PATCH 10/10] selftests/pcie_bwctrl: Create selftests
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
                   ` (8 preceding siblings ...)
  2023-08-17 12:17 ` [PATCH 09/10] thermal: Add PCIe cooling driver Ilpo Järvinen
@ 2023-08-17 12:17 ` Ilpo Järvinen
  2023-09-04  6:26 ` [PATCH 00/10] Add PCIe Bandwidth Controller Krishna Chaitanya Chundru
  10 siblings, 0 replies; 24+ messages in thread
From: Ilpo Järvinen @ 2023-08-17 12:17 UTC (permalink / raw)
  To: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Shuah Khan, Ilpo Järvinen, linux-kernel, linux-kselftest
  Cc: Krishna chaitanya chundru, Srinivas Pandruvada, Alex Deucher

Create selftests for PCIe BW control through the PCIe cooling device
sysfs interface.

First, the BW control selftest finds the PCIe port to test with. By
default, the PCIe port with the highest bus speed is selected but
another PCIe port can be provided with -d parameter.

The actual test steps the cur_state of the cooling device one-by-one
from max_state to what the cur_state was initially. The speed change
is confirmed by observing the current_link_speed for the corresponding
PCIe port.

Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
---
 MAINTAINERS                                   |   1 +
 tools/testing/selftests/Makefile              |   1 +
 tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
 .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 ++++++++++++++++++
 .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 ++++++++++
 5 files changed, 193 insertions(+)
 create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
 create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh

diff --git a/MAINTAINERS b/MAINTAINERS
index a0b40253fd5a..10146eba993f 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -16423,6 +16423,7 @@ S:	Supported
 F:	drivers/pci/pcie/bwctrl.c
 F:	drivers/thermal/pcie_cooling.c
 F:	include/linux/pci-bwctrl.h
+F:	tools/testing/selftests/pcie_bwctrl/
 
 PCIE DRIVER FOR AMAZON ANNAPURNA LABS
 M:	Jonathan Chocron <jonnyc@amazon.com>
diff --git a/tools/testing/selftests/Makefile b/tools/testing/selftests/Makefile
index 666b56f22a41..b8ebc324186e 100644
--- a/tools/testing/selftests/Makefile
+++ b/tools/testing/selftests/Makefile
@@ -56,6 +56,7 @@ TARGETS += net/mptcp
 TARGETS += net/openvswitch
 TARGETS += netfilter
 TARGETS += nsfs
+TARGETS += pcie_bwctrl
 TARGETS += pidfd
 TARGETS += pid_namespace
 TARGETS += powerpc
diff --git a/tools/testing/selftests/pcie_bwctrl/Makefile b/tools/testing/selftests/pcie_bwctrl/Makefile
new file mode 100644
index 000000000000..3e84e26341d1
--- /dev/null
+++ b/tools/testing/selftests/pcie_bwctrl/Makefile
@@ -0,0 +1,2 @@
+TEST_PROGS = set_pcie_cooling_state.sh
+include ../lib.mk
diff --git a/tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh b/tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
new file mode 100755
index 000000000000..6edea316befd
--- /dev/null
+++ b/tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
@@ -0,0 +1,122 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+SYSFS=
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+retval=0
+skipmsg="skip all tests:"
+
+PCIEPORTTYPE="PCIe_Port"
+
+prerequisite()
+{
+	local ports
+
+	if [ $UID != 0 ]; then
+		echo $skipmsg must be run as root >&2
+		exit $ksft_skip
+	fi
+
+	SYSFS=`mount -t sysfs | head -1 | awk '{ print $3 }'`
+
+	if [ ! -d "$SYSFS" ]; then
+		echo $skipmsg sysfs is not mounted >&2
+		exit $ksft_skip
+	fi
+
+	if ! ls $SYSFS/class/thermal/cooling_device* > /dev/null 2>&1; then
+		echo $skipmsg thermal cooling devices missing >&2
+		exit $ksft_skip
+        fi
+
+	ports=`grep -e "^$PCIEPORTTYPE" $SYSFS/class/thermal/cooling_device*/type | wc -l`
+	if [ $ports -eq 0 ]; then
+		echo $skipmsg pcie cooling devices missing >&2
+		exit $ksft_skip
+	fi
+}
+
+testport=
+find_pcie_port()
+{
+	local patt="$1"
+	local pcieports
+	local max
+	local cur
+	local delta
+	local bestdelta=-1
+
+	pcieports=`grep -l -F -e "$patt" /sys/class/thermal/cooling_device*/type`
+	if [ -z "$pcieports" ]; then
+		return
+	fi
+	pcieports=${pcieports//\/type/}
+	# Find the port with highest PCIe bus speed
+	for port in $pcieports; do
+		max=`cat $port/max_state`
+		cur=`cat $port/cur_state`
+		delta=$((max-cur))
+		if [ $delta -gt $bestdelta ]; then
+			testport="$port"
+			bestdelta=$delta
+		fi
+	done
+}
+
+sysfspcidev=
+find_sysfs_pci_dev()
+{
+	local typefile="$1/type"
+	local pcidir
+
+	pcidir="$SYSFS/bus/pci/devices/`sed -e "s|^${PCIEPORTTYPE}_||g" $typefile`"
+
+	if [ -r "$pcidir/current_link_speed" ]; then
+		sysfspcidev="$pcidir/current_link_speed"
+	fi
+}
+
+usage()
+{
+	echo "Usage $0 [ -d dev ]"
+	echo -e "\t-d: PCIe port BDF string (e.g., 0000:00:04.0)"
+}
+
+pattern="$PCIEPORTTYPE"
+parse_arguments()
+{
+	while getopts d:h opt; do
+		case $opt in
+			h)
+				usage "$0"
+				exit 0
+				;;
+			d)
+				pattern="$PCIEPORTTYPE_$OPTARG"
+				;;
+			*)
+				usage "$0"
+				exit 0
+				;;
+		esac
+	done
+}
+
+parse_arguments "$@"
+prerequisite
+find_pcie_port "$pattern"
+if [ -z "$testport" ]; then
+	echo $skipmsg "pcie cooling device not found from sysfs" >&2
+	exit $ksft_skip
+fi
+find_sysfs_pci_dev "$testport"
+if [ -z "$sysfspcidev" ]; then
+	echo $skipmsg "PCIe port device not found from sysfs" >&2
+	exit $ksft_skip
+fi
+
+./set_pcie_speed.sh "$testport" "$sysfspcidev"
+retval=$?
+
+exit $retval
diff --git a/tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh b/tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh
new file mode 100755
index 000000000000..584596949312
--- /dev/null
+++ b/tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh
@@ -0,0 +1,67 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+set -e
+
+TESTNAME=set_pcie_speed
+
+declare -a PCIELINKSPEED=(
+	"2.5 GT/s PCIe"
+	"5.0 GT/s PCIe"
+	"8.0 GT/s PCIe"
+	"16.0 GT/s PCIe"
+	"32.0 GT/s PCIe"
+	"64.0 GT/s PCIe"
+)
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+retval=0
+
+coolingdev="$1"
+statefile="$coolingdev/cur_state"
+maxfile="$coolingdev/max_state"
+linkspeedfile="$2"
+
+oldstate=`cat $statefile`
+maxstate=`cat $maxfile`
+
+set_state()
+{
+	local state=$1
+	local linkspeed
+	local expected_linkspeed
+
+	echo $state > $statefile
+
+	sleep 1
+
+	linkspeed="`cat $linkspeedfile`"
+	expected_linkspeed=$((maxstate-state))
+	expected_str="${PCIELINKSPEED[$expected_linkspeed]}"
+	if [ ! "${expected_str}" = "${linkspeed}" ]; then
+		echo "$TESTNAME failed: expected: ${expected_str}; got ${linkspeed}"
+		retval=1
+	fi
+}
+
+cleanup_skip ()
+{
+	set_state $oldstate
+	exit $ksft_skip
+}
+
+trap cleanup_skip EXIT
+
+echo "$TESTNAME: testing states $maxstate .. $oldstate with $coolingdev"
+for i in $(seq $maxstate -1 $oldstate); do
+	set_state "$i"
+done
+
+trap EXIT
+if [ $retval -eq 0 ]; then
+	echo "$TESTNAME [PASS]"
+else
+	echo "$TESTNAME [FAIL]"
+fi
+exit $retval
-- 
2.30.2


^ permalink raw reply related	[flat|nested] 24+ messages in thread

* Re: [PATCH 09/10] thermal: Add PCIe cooling driver
  2023-08-17 12:17 ` [PATCH 09/10] thermal: Add PCIe cooling driver Ilpo Järvinen
@ 2023-08-23 19:47   ` Rafael J. Wysocki
  0 siblings, 0 replies; 24+ messages in thread
From: Rafael J. Wysocki @ 2023-08-23 19:47 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc,
	Bjorn Helgaas, Rafael J. Wysocki, Daniel Lezcano, Amit Kucheria,
	Zhang Rui, linux-kernel, linux-pm, Krishna chaitanya chundru,
	Srinivas Pandruvada, Alex Deucher

On Thu, Aug 17, 2023 at 2:18 PM Ilpo Järvinen
<ilpo.jarvinen@linux.intel.com> wrote:
>
> Add a thermal cooling driver to provide path to access PCIe bandwidth
> controller using the usual thermal interfaces.
>
> A cooling device is instantiated for controllable PCIe ports from the
> bwctrl service driver.
>
> The thermal side state 0 means no throttling, i.e., maximum supported
> PCIe speed.
>
> Signed-off-by: Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>

From the cooling device interface perspective

Acked-by: Rafael J. Wysocki <rafael@kernel.org>

> ---
>  MAINTAINERS                    |   1 +
>  drivers/pci/pcie/bwctrl.c      |  11 ++++
>  drivers/thermal/Kconfig        |  10 +++
>  drivers/thermal/Makefile       |   2 +
>  drivers/thermal/pcie_cooling.c | 107 +++++++++++++++++++++++++++++++++
>  include/linux/pci-bwctrl.h     |  15 +++++
>  6 files changed, 146 insertions(+)
>  create mode 100644 drivers/thermal/pcie_cooling.c
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index d2eed2883a43..a0b40253fd5a 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -16421,6 +16421,7 @@ PCIE BANDWIDTH CONTROLLER
>  M:     Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>
>  S:     Supported
>  F:     drivers/pci/pcie/bwctrl.c
> +F:     drivers/thermal/pcie_cooling.c
>  F:     include/linux/pci-bwctrl.h
>
>  PCIE DRIVER FOR AMAZON ANNAPURNA LABS
> diff --git a/drivers/pci/pcie/bwctrl.c b/drivers/pci/pcie/bwctrl.c
> index e3172d69476f..13c73546244e 100644
> --- a/drivers/pci/pcie/bwctrl.c
> +++ b/drivers/pci/pcie/bwctrl.c
> @@ -34,9 +34,11 @@
>  /**
>   * struct bwctrl_service_data - PCIe Port Bandwidth Controller
>   * @set_speed_mutex: serializes link speed changes
> + * @cdev: thermal cooling device associated with the port
>   */
>  struct bwctrl_service_data {
>         struct mutex set_speed_mutex;
> +       struct thermal_cooling_device *cdev;
>  };
>
>  static bool bwctrl_valid_pcie_speed(enum pci_bus_speed speed)
> @@ -253,8 +255,16 @@ static int pcie_bandwidth_notification_probe(struct pcie_device *srv)
>         pcie_enable_link_bandwidth_notification(port);
>         pci_info(port, "enabled with IRQ %d\n", srv->irq);
>
> +       data->cdev = pcie_cooling_device_register(port, srv);
> +       if (IS_ERR(data->cdev)) {
> +               ret = PTR_ERR(data->cdev);
> +               goto disable_notifications;
> +       }
>         return 0;
>
> +disable_notifications:
> +       pcie_disable_link_bandwidth_notification(srv->port);
> +       kfree(data);
>  free_irq:
>         free_irq(srv->irq, srv);
>         return ret;
> @@ -264,6 +274,7 @@ static void pcie_bandwidth_notification_remove(struct pcie_device *srv)
>  {
>         struct bwctrl_service_data *data = get_service_data(srv);
>
> +       pcie_cooling_device_unregister(data->cdev);
>         pcie_disable_link_bandwidth_notification(srv->port);
>         free_irq(srv->irq, srv);
>         mutex_destroy(&data->set_speed_mutex);
> diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig
> index 19a4b33cb564..7deda3a0237d 100644
> --- a/drivers/thermal/Kconfig
> +++ b/drivers/thermal/Kconfig
> @@ -219,6 +219,16 @@ config DEVFREQ_THERMAL
>
>           If you want this support, you should say Y here.
>
> +config PCIE_THERMAL
> +       bool "PCIe cooling support"
> +       depends on PCIEPORTBUS
> +       select PCIE_BW
> +       help
> +         This implements PCIe cooling mechanism through bandwidth reduction
> +         for PCIe devices.
> +
> +         If you want this support, you should say Y here.
> +
>  config THERMAL_EMULATION
>         bool "Thermal emulation mode support"
>         help
> diff --git a/drivers/thermal/Makefile b/drivers/thermal/Makefile
> index 058664bc3ec0..065972a08c84 100644
> --- a/drivers/thermal/Makefile
> +++ b/drivers/thermal/Makefile
> @@ -30,6 +30,8 @@ thermal_sys-$(CONFIG_CPU_IDLE_THERMAL)        += cpuidle_cooling.o
>  # devfreq cooling
>  thermal_sys-$(CONFIG_DEVFREQ_THERMAL) += devfreq_cooling.o
>
> +thermal_sys-$(CONFIG_PCIE_THERMAL) += pcie_cooling.o
> +
>  obj-$(CONFIG_K3_THERMAL)       += k3_bandgap.o k3_j72xx_bandgap.o
>  # platform thermal drivers
>  obj-y                          += broadcom/
> diff --git a/drivers/thermal/pcie_cooling.c b/drivers/thermal/pcie_cooling.c
> new file mode 100644
> index 000000000000..d86265c03e80
> --- /dev/null
> +++ b/drivers/thermal/pcie_cooling.c
> @@ -0,0 +1,107 @@
> +// SPDX-License-Identifier: GPL-2.0-only
> +/*
> + * PCIe cooling device
> + *
> + * Copyright (C) 2023 Intel Corporation.
> + */
> +
> +#include <linux/build_bug.h>
> +#include <linux/err.h>
> +#include <linux/kernel.h>
> +#include <linux/pci.h>
> +#include <linux/pci-bwctrl.h>
> +#include <linux/slab.h>
> +#include <linux/string.h>
> +#include <linux/thermal.h>
> +
> +#define COOLING_DEV_TYPE_PREFIX                "PCIe_Port_"
> +
> +struct pcie_cooling_device {
> +       struct pci_dev *port;
> +       struct pcie_device *pdev;
> +};
> +
> +static int pcie_cooling_get_max_level(struct thermal_cooling_device *cdev, unsigned long *state)
> +{
> +       struct pcie_cooling_device *pcie_cdev = cdev->devdata;
> +
> +       /* cooling state 0 is same as the maximum PCIe speed */
> +       *state = pcie_cdev->port->subordinate->max_bus_speed - PCIE_SPEED_2_5GT;
> +
> +       return 0;
> +}
> +
> +static int pcie_cooling_get_cur_level(struct thermal_cooling_device *cdev, unsigned long *state)
> +{
> +       struct pcie_cooling_device *pcie_cdev = cdev->devdata;
> +
> +       /* cooling state 0 is same as the maximum PCIe speed */
> +       *state = cdev->max_state -
> +                (pcie_cdev->port->subordinate->cur_bus_speed - PCIE_SPEED_2_5GT);
> +
> +       return 0;
> +}
> +
> +static int pcie_cooling_set_cur_level(struct thermal_cooling_device *cdev, unsigned long state)
> +{
> +       struct pcie_cooling_device *pcie_cdev = cdev->devdata;
> +       enum pci_bus_speed speed;
> +
> +       /* cooling state 0 is same as the maximum PCIe speed */
> +       speed = (cdev->max_state - state) + PCIE_SPEED_2_5GT;
> +
> +       return bwctrl_set_current_speed(pcie_cdev->pdev, speed);
> +}
> +
> +static struct thermal_cooling_device_ops pcie_cooling_ops = {
> +       .get_max_state = pcie_cooling_get_max_level,
> +       .get_cur_state = pcie_cooling_get_cur_level,
> +       .set_cur_state = pcie_cooling_set_cur_level,
> +};
> +
> +struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
> +                                                           struct pcie_device *pdev)
> +{
> +       struct pcie_cooling_device *pcie_cdev;
> +       struct thermal_cooling_device *cdev;
> +       size_t name_len;
> +       char *name;
> +
> +       pcie_cdev = kzalloc(sizeof(*pcie_cdev), GFP_KERNEL);
> +       if (!pcie_cdev)
> +               return ERR_PTR(-ENOMEM);
> +
> +       pcie_cdev->port = port;
> +       pcie_cdev->pdev = pdev;
> +
> +       name_len = strlen(COOLING_DEV_TYPE_PREFIX) + strlen(pci_name(port)) + 1;
> +       name = kzalloc(name_len, GFP_KERNEL);
> +       if (!name) {
> +               kfree(pcie_cdev);
> +               return ERR_PTR(-ENOMEM);
> +       }
> +
> +       snprintf(name, name_len, COOLING_DEV_TYPE_PREFIX "%s", pci_name(port));
> +       cdev = thermal_cooling_device_register(name, pcie_cdev, &pcie_cooling_ops);
> +       kfree(name);
> +
> +       return cdev;
> +}
> +
> +void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev)
> +{
> +       struct pcie_cooling_device *pcie_cdev = cdev->devdata;
> +
> +       thermal_cooling_device_unregister(cdev);
> +       kfree(pcie_cdev);
> +}
> +
> +/* For bus_speed <-> state arithmetic */
> +static_assert(PCIE_SPEED_2_5GT + 1 == PCIE_SPEED_5_0GT);
> +static_assert(PCIE_SPEED_5_0GT + 1 == PCIE_SPEED_8_0GT);
> +static_assert(PCIE_SPEED_8_0GT + 1 == PCIE_SPEED_16_0GT);
> +static_assert(PCIE_SPEED_16_0GT + 1 == PCIE_SPEED_32_0GT);
> +static_assert(PCIE_SPEED_32_0GT + 1 == PCIE_SPEED_64_0GT);
> +
> +MODULE_AUTHOR("Ilpo Järvinen <ilpo.jarvinen@linux.intel.com>");
> +MODULE_DESCRIPTION("PCIe cooling driver");
> diff --git a/include/linux/pci-bwctrl.h b/include/linux/pci-bwctrl.h
> index 46026fa25deb..366445517b72 100644
> --- a/include/linux/pci-bwctrl.h
> +++ b/include/linux/pci-bwctrl.h
> @@ -15,4 +15,19 @@ struct thermal_cooling_device;
>
>  int bwctrl_set_current_speed(struct pcie_device *srv, enum pci_bus_speed speed);
>
> +#ifdef CONFIG_PCIE_THERMAL
> +struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
> +                                                           struct pcie_device *pdev);
> +void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev);
> +#else
> +static inline struct thermal_cooling_device *pcie_cooling_device_register(struct pci_dev *port,
> +                                                                         struct pcie_device *pdev)
> +{
> +       return NULL;
> +}
> +static inline void pcie_cooling_device_unregister(struct thermal_cooling_device *cdev)
> +{
> +}
> +#endif
> +
>  #endif
> --
> 2.30.2
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add PCIe Bandwidth Controller
  2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
                   ` (9 preceding siblings ...)
  2023-08-17 12:17 ` [PATCH 10/10] selftests/pcie_bwctrl: Create selftests Ilpo Järvinen
@ 2023-09-04  6:26 ` Krishna Chaitanya Chundru
  2023-09-04 11:16   ` Ilpo Järvinen
  10 siblings, 1 reply; 24+ messages in thread
From: Krishna Chaitanya Chundru @ 2023-09-04  6:26 UTC (permalink / raw)
  To: Ilpo Järvinen, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Lukas Wunner,
	Alexandru Gagniuc
  Cc: linux-kernel, Srinivas Pandruvada, Alex Deucher


On 8/17/2023 5:46 PM, Ilpo Järvinen wrote:
> Hi all,
>
> This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
> cooling driver to the thermal core side for limiting PCIe link speed
> due to thermal reasons. PCIe bandwidth controller is a PCI express bus
> port service driver. A cooling device is created for each port the
> service driver finds if they support changing speeds.

I see we had support for only link speed changes here but we need to add 
support for

link width change also as bandwidth notification from PCIe supports both 
link speed and link width.

- KC

>
> bwctrl now is built on top of BW notifications revert. I'm just not
> sure what is the best practice when re-adding some old functionality in
> a modified form so please let me know if I need to somehow alter that
> patch.
>
> The series is based on top of the RMW changes in pci/pcie-rmw.
>
> Ilpo Järvinen (10):
>    PCI: Protect Link Control 2 Register with RMW locking
>    drm/radeon: Use RMW accessors for changing LNKCTL2
>    drm/amdgpu: Use RMW accessors for changing LNKCTL2
>    drm/IB/hfi1: Use RMW accessors for changing LNKCTL2
>    PCI: Store all PCIe Supported Link Speeds
>    PCI: Cache PCIe device's Supported Speed Vector
>    PCI/LINK: Re-add BW notification portdrv as PCIe BW controller
>    PCI/bwctrl: Add "controller" part into PCIe bwctrl
>    thermal: Add PCIe cooling driver
>    selftests/pcie_bwctrl: Create selftests
>
>   MAINTAINERS                                   |   8 +
>   drivers/gpu/drm/amd/amdgpu/cik.c              |  41 +--
>   drivers/gpu/drm/amd/amdgpu/si.c               |  41 +--
>   drivers/gpu/drm/radeon/cik.c                  |  40 +--
>   drivers/gpu/drm/radeon/si.c                   |  40 +--
>   drivers/infiniband/hw/hfi1/pcie.c             |  30 +-
>   drivers/pci/pcie/Kconfig                      |   9 +
>   drivers/pci/pcie/Makefile                     |   1 +
>   drivers/pci/pcie/bwctrl.c                     | 309 ++++++++++++++++++
>   drivers/pci/pcie/portdrv.c                    |   9 +-
>   drivers/pci/pcie/portdrv.h                    |  10 +-
>   drivers/pci/probe.c                           |  38 ++-
>   drivers/pci/remove.c                          |   2 +
>   drivers/thermal/Kconfig                       |  10 +
>   drivers/thermal/Makefile                      |   2 +
>   drivers/thermal/pcie_cooling.c                | 107 ++++++
>   include/linux/pci-bwctrl.h                    |  33 ++
>   include/linux/pci.h                           |   3 +
>   include/uapi/linux/pci_regs.h                 |   1 +
>   tools/testing/selftests/Makefile              |   1 +
>   tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
>   .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 +++++++
>   .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 ++++
>   23 files changed, 795 insertions(+), 131 deletions(-)
>   create mode 100644 drivers/pci/pcie/bwctrl.c
>   create mode 100644 drivers/thermal/pcie_cooling.c
>   create mode 100644 include/linux/pci-bwctrl.h
>   create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
>   create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
>   create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh
>

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add PCIe Bandwidth Controller
  2023-09-04  6:26 ` [PATCH 00/10] Add PCIe Bandwidth Controller Krishna Chaitanya Chundru
@ 2023-09-04 11:16   ` Ilpo Järvinen
  2023-09-11 13:21     ` Krishna Chaitanya Chundru
  0 siblings, 1 reply; 24+ messages in thread
From: Ilpo Järvinen @ 2023-09-04 11:16 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc, LKML,
	Srinivas Pandruvada, Alex Deucher

[-- Attachment #1: Type: text/plain, Size: 3874 bytes --]

On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote:

> 
> On 8/17/2023 5:46 PM, Ilpo Järvinen wrote:
> > Hi all,
> > 
> > This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
> > cooling driver to the thermal core side for limiting PCIe link speed
> > due to thermal reasons. PCIe bandwidth controller is a PCI express bus
> > port service driver. A cooling device is created for each port the
> > service driver finds if they support changing speeds.
> 
> I see we had support for only link speed changes here but we need to add
> support for
> 
> link width change also as bandwidth notification from PCIe supports both link
> speed and link width.

Hi,

Thanks for the comment. In case you mean that the changes in Link Width 
should be reported correctly, they already are since the sysfs interface 
reads them directly from LNKSTA register.

Or did you perhaps mean that Bandwidth Controller should support also 
changing Link Width? If this is the case I don't know how it can be 
realized so a pointer on how it can be achieved would be appreciated.

-- 
 i.

> > bwctrl now is built on top of BW notifications revert. I'm just not
> > sure what is the best practice when re-adding some old functionality in
> > a modified form so please let me know if I need to somehow alter that
> > patch.
> > 
> > The series is based on top of the RMW changes in pci/pcie-rmw.
> > 
> > Ilpo Järvinen (10):
> >    PCI: Protect Link Control 2 Register with RMW locking
> >    drm/radeon: Use RMW accessors for changing LNKCTL2
> >    drm/amdgpu: Use RMW accessors for changing LNKCTL2
> >    drm/IB/hfi1: Use RMW accessors for changing LNKCTL2
> >    PCI: Store all PCIe Supported Link Speeds
> >    PCI: Cache PCIe device's Supported Speed Vector
> >    PCI/LINK: Re-add BW notification portdrv as PCIe BW controller
> >    PCI/bwctrl: Add "controller" part into PCIe bwctrl
> >    thermal: Add PCIe cooling driver
> >    selftests/pcie_bwctrl: Create selftests
> > 
> >   MAINTAINERS                                   |   8 +
> >   drivers/gpu/drm/amd/amdgpu/cik.c              |  41 +--
> >   drivers/gpu/drm/amd/amdgpu/si.c               |  41 +--
> >   drivers/gpu/drm/radeon/cik.c                  |  40 +--
> >   drivers/gpu/drm/radeon/si.c                   |  40 +--
> >   drivers/infiniband/hw/hfi1/pcie.c             |  30 +-
> >   drivers/pci/pcie/Kconfig                      |   9 +
> >   drivers/pci/pcie/Makefile                     |   1 +
> >   drivers/pci/pcie/bwctrl.c                     | 309 ++++++++++++++++++
> >   drivers/pci/pcie/portdrv.c                    |   9 +-
> >   drivers/pci/pcie/portdrv.h                    |  10 +-
> >   drivers/pci/probe.c                           |  38 ++-
> >   drivers/pci/remove.c                          |   2 +
> >   drivers/thermal/Kconfig                       |  10 +
> >   drivers/thermal/Makefile                      |   2 +
> >   drivers/thermal/pcie_cooling.c                | 107 ++++++
> >   include/linux/pci-bwctrl.h                    |  33 ++
> >   include/linux/pci.h                           |   3 +
> >   include/uapi/linux/pci_regs.h                 |   1 +
> >   tools/testing/selftests/Makefile              |   1 +
> >   tools/testing/selftests/pcie_bwctrl/Makefile  |   2 +
> >   .../pcie_bwctrl/set_pcie_cooling_state.sh     | 122 +++++++
> >   .../selftests/pcie_bwctrl/set_pcie_speed.sh   |  67 ++++
> >   23 files changed, 795 insertions(+), 131 deletions(-)
> >   create mode 100644 drivers/pci/pcie/bwctrl.c
> >   create mode 100644 drivers/thermal/pcie_cooling.c
> >   create mode 100644 include/linux/pci-bwctrl.h
> >   create mode 100644 tools/testing/selftests/pcie_bwctrl/Makefile
> >   create mode 100755
> > tools/testing/selftests/pcie_bwctrl/set_pcie_cooling_state.sh
> >   create mode 100755 tools/testing/selftests/pcie_bwctrl/set_pcie_speed.sh
> > 
> 

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add PCIe Bandwidth Controller
  2023-09-04 11:16   ` Ilpo Järvinen
@ 2023-09-11 13:21     ` Krishna Chaitanya Chundru
  2023-09-11 15:47       ` Ilpo Järvinen
  0 siblings, 1 reply; 24+ messages in thread
From: Krishna Chaitanya Chundru @ 2023-09-11 13:21 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc, LKML,
	Srinivas Pandruvada, Alex Deucher


On 9/4/2023 4:46 PM, Ilpo Järvinen wrote:
> On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote:
>
>> On 8/17/2023 5:46 PM, Ilpo Järvinen wrote:
>>> Hi all,
>>>
>>> This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
>>> cooling driver to the thermal core side for limiting PCIe link speed
>>> due to thermal reasons. PCIe bandwidth controller is a PCI express bus
>>> port service driver. A cooling device is created for each port the
>>> service driver finds if they support changing speeds.
>> I see we had support for only link speed changes here but we need to add
>> support for
>>
>> link width change also as bandwidth notification from PCIe supports both link
>> speed and link width.
> Hi,
>
> Thanks for the comment. In case you mean that the changes in Link Width
> should be reported correctly, they already are since the sysfs interface
> reads them directly from LNKSTA register.
>
> Or did you perhaps mean that Bandwidth Controller should support also
> changing Link Width? If this is the case I don't know how it can be
> realized so a pointer on how it can be achieved would be appreciated.

Hi,

I didn't have any idea on how thermal framework works.

But as we are adding bandwidth controller support we need to add support 
for width change also, may be we are not using this now, but we may need 
it in the future.

We had similar use case based on the bandwidth requirement on devices 
like WLAN, the client try to reduce or increase the link speed and link 
width.

So in the bandwidth controller driver we can add support for link width 
also. So any client can easily use the driver to change link speed or 
width or both to reduce the power consumption.

Adding link width support should be similar to how you added the link 
speed supported.

Please correct me if I misunderstood something here.

Thanks & Regards,

Krishna Chaitanya.


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add PCIe Bandwidth Controller
  2023-09-11 13:21     ` Krishna Chaitanya Chundru
@ 2023-09-11 15:47       ` Ilpo Järvinen
  2023-09-11 16:14         ` srinivas pandruvada
  0 siblings, 1 reply; 24+ messages in thread
From: Ilpo Järvinen @ 2023-09-11 15:47 UTC (permalink / raw)
  To: Krishna Chaitanya Chundru, Rafael J. Wysocki,
	Srinivas Pandruvada, Daniel Lezcano, Zhang Rui, linux-pm
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc, LKML,
	Alex Deucher

[-- Attachment #1: Type: text/plain, Size: 2777 bytes --]

+ thermal people.

On Mon, 11 Sep 2023, Krishna Chaitanya Chundru wrote:
> On 9/4/2023 4:46 PM, Ilpo Järvinen wrote:
> > On Mon, 4 Sep 2023, Krishna Chaitanya Chundru wrote:
> > > On 8/17/2023 5:46 PM, Ilpo Järvinen wrote:
> > > > 
> > > > This series adds PCIe bandwidth controller (bwctrl) and associated PCIe
> > > > cooling driver to the thermal core side for limiting PCIe link speed
> > > > due to thermal reasons. PCIe bandwidth controller is a PCI express bus
> > > > port service driver. A cooling device is created for each port the
> > > > service driver finds if they support changing speeds.
> > > I see we had support for only link speed changes here but we need to add
> > > support for
> > > 
> > > link width change also as bandwidth notification from PCIe supports both
> > > link
> > > speed and link width.
> > Hi,
> > 
> > Thanks for the comment. In case you mean that the changes in Link Width
> > should be reported correctly, they already are since the sysfs interface
> > reads them directly from LNKSTA register.
> > 
> > Or did you perhaps mean that Bandwidth Controller should support also
> > changing Link Width? If this is the case I don't know how it can be
> > realized so a pointer on how it can be achieved would be appreciated.
> 
> I didn't have any idea on how thermal framework works.
> 
> But as we are adding bandwidth controller support we need to add support for
> width change also, may be we are not using this now, but we may need it in the
> future.
> 
> We had similar use case based on the bandwidth requirement on devices like
> WLAN, the client try to reduce or increase the link speed and link width.
> 
> So in the bandwidth controller driver we can add support for link width also.
> So any client can easily use the driver to change link speed or width or both
> to reduce the power consumption.
> 
> Adding link width support should be similar to how you added the link speed
> supported.
> 
> Please correct me if I misunderstood something here.

Hi,

Okay, thanks for the clarification. So the point is to plan for adding 
support for Link Width later and currently only support throttling Link 
Speed. In any case, the Link Width control seems to be controlled using 
a different approach (Link Width change does not require Link Retraining).

I don't know either how such 2 dimensioned throttling (Link Speed and 
Link Width) is supposed to be realized using the thermal/cooling device 
interface which only provides a single integer as the current state. That 
is, whether to provide a single cooling device (with a single integer 
exposed to userspace) or separate cooling device for each dimension?

Perhaps thermal people could provide some insight on this? Is there some 
precedent I could take look at?

-- 
 i.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add PCIe Bandwidth Controller
  2023-09-11 15:47       ` Ilpo Järvinen
@ 2023-09-11 16:14         ` srinivas pandruvada
  2023-09-12 12:52           ` Ilpo Järvinen
  0 siblings, 1 reply; 24+ messages in thread
From: srinivas pandruvada @ 2023-09-11 16:14 UTC (permalink / raw)
  To: Ilpo Järvinen, Krishna Chaitanya Chundru, Rafael J. Wysocki,
	Daniel Lezcano, Zhang Rui, linux-pm
  Cc: linux-pci, Bjorn Helgaas, Lorenzo Pieralisi, Rob Herring,
	Krzysztof Wilczyński, Lukas Wunner, Alexandru Gagniuc, LKML,
	Alex Deucher

On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote:
> + thermal people.
> 
> 

...

> Hi,
> 
> Okay, thanks for the clarification. So the point is to plan for
> adding 
> support for Link Width later and currently only support throttling
> Link 
> Speed. In any case, the Link Width control seems to be controlled
> using 
> a different approach (Link Width change does not require Link
> Retraining).
> 
> I don't know either how such 2 dimensioned throttling (Link Speed and
> Link Width) is supposed to be realized using the thermal/cooling
> device 
> interface which only provides a single integer as the current state.
> That 
> is, whether to provide a single cooling device (with a single integer
> exposed to userspace) or separate cooling device for each dimension?
> 
> Perhaps thermal people could provide some insight on this? Is there
> some 
> precedent I could take look at?
Yes. The processor cooling device does similar. 1-3 are reserved for P-
state and and 4-7 for T-states.

But I don't suggest using such method. This causes confusion and
difficult to change. For example if we increase range of P-state
control, then there is no way to know what is the start point of T-
states.

It is best to create to separate cooling devices for BW and link width.

Also there is a requirement that anything you add to thermal sysfs, it
should have some purpose for thermal control. I hope Link width control
is targeted to similar use case BW control.

Thanks,
Srinivas


> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add PCIe Bandwidth Controller
  2023-09-11 16:14         ` srinivas pandruvada
@ 2023-09-12 12:52           ` Ilpo Järvinen
  2023-09-12 17:45             ` srinivas pandruvada
  0 siblings, 1 reply; 24+ messages in thread
From: Ilpo Järvinen @ 2023-09-12 12:52 UTC (permalink / raw)
  To: srinivas pandruvada
  Cc: Krishna Chaitanya Chundru, Rafael J. Wysocki, Daniel Lezcano,
	Zhang Rui, linux-pm, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Lukas Wunner,
	Alexandru Gagniuc, LKML, Alex Deucher

[-- Attachment #1: Type: text/plain, Size: 2201 bytes --]

On Mon, 11 Sep 2023, srinivas pandruvada wrote:
> On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote:
> > 
> > Okay, thanks for the clarification. So the point is to plan for
> > adding 
> > support for Link Width later and currently only support throttling
> > Link 
> > Speed. In any case, the Link Width control seems to be controlled
> > using 
> > a different approach (Link Width change does not require Link
> > Retraining).
> > 
> > I don't know either how such 2 dimensioned throttling (Link Speed and
> > Link Width) is supposed to be realized using the thermal/cooling
> > device 
> > interface which only provides a single integer as the current state.
> > That 
> > is, whether to provide a single cooling device (with a single integer
> > exposed to userspace) or separate cooling device for each dimension?
> > 
> > Perhaps thermal people could provide some insight on this? Is there
> > some 
> > precedent I could take look at?
>
> Yes. The processor cooling device does similar. 1-3 are reserved for P-
> state and and 4-7 for T-states.
> 
> But I don't suggest using such method. This causes confusion and
> difficult to change. For example if we increase range of P-state
> control, then there is no way to know what is the start point of T-
> states.

Yes. I understand it would be confusing.

> It is best to create to separate cooling devices for BW and link width.

Okay. If that's the case, then I see no reason to add the Link Width 
cooling device now as it could do nothing besides reporting the current 
link width.

The only question that then remains is how to take this into account in 
the naming of the cooling devices, currently PCIe_Port_<pci_name()> is 
used but perhaps it would be better to change that to 
PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be added 
later beside it?

> Also there is a requirement that anything you add to thermal sysfs, it
> should have some purpose for thermal control. I hope Link width control
> is targeted to similar use case BW control.

Ability to control Link Width seems to be part of PCIe 6.0 L0p. AFAICT, 
the reasons are to lower/control power consumption so it seems to be 
within scope.


-- 
 i.

^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add PCIe Bandwidth Controller
  2023-09-12 12:52           ` Ilpo Järvinen
@ 2023-09-12 17:45             ` srinivas pandruvada
  2023-09-12 18:08               ` srinivas pandruvada
  0 siblings, 1 reply; 24+ messages in thread
From: srinivas pandruvada @ 2023-09-12 17:45 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Krishna Chaitanya Chundru, Rafael J. Wysocki, Daniel Lezcano,
	Zhang Rui, linux-pm, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Lukas Wunner,
	Alexandru Gagniuc, LKML, Alex Deucher

On Tue, 2023-09-12 at 15:52 +0300, Ilpo Järvinen wrote:
> On Mon, 11 Sep 2023, srinivas pandruvada wrote:
> > On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote:
> > 
> > 

[...]

> > But I don't suggest using such method. This causes confusion and
> > difficult to change. For example if we increase range of P-state
> > control, then there is no way to know what is the start point of T-
> > states.
> 
> Yes. I understand it would be confusing.
> 
> > It is best to create to separate cooling devices for BW and link
> > width.
> 
> Okay. If that's the case, then I see no reason to add the Link Width 
> cooling device now as it could do nothing besides reporting the
> current 
> link width.
> 
> The only question that then remains is how to take this into account
> in 
> the naming of the cooling devices, currently PCIe_Port_<pci_name()>
> is 
> used but perhaps it would be better to change that to 
> PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be added
> later beside it?
It is better in that way to add BW controller later.

Also adding separate cooling device will let thermal configuration,
choose different method at different thermal thresholds or all
together.

Thanks,
Srinivas

> 
> > Also there is a requirement that anything you add to thermal sysfs,
> > it
> > should have some purpose for thermal control. I hope Link width
> > control
> > is targeted to similar use case BW control.
> 
> Ability to control Link Width seems to be part of PCIe 6.0 L0p.
> AFAICT, 
> the reasons are to lower/control power consumption so it seems to be 
> within scope.
> 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

* Re: [PATCH 00/10] Add PCIe Bandwidth Controller
  2023-09-12 17:45             ` srinivas pandruvada
@ 2023-09-12 18:08               ` srinivas pandruvada
  0 siblings, 0 replies; 24+ messages in thread
From: srinivas pandruvada @ 2023-09-12 18:08 UTC (permalink / raw)
  To: Ilpo Järvinen
  Cc: Krishna Chaitanya Chundru, Rafael J. Wysocki, Daniel Lezcano,
	Zhang Rui, linux-pm, linux-pci, Bjorn Helgaas, Lorenzo Pieralisi,
	Rob Herring, Krzysztof Wilczyński, Lukas Wunner,
	Alexandru Gagniuc, LKML, Alex Deucher

On Tue, 2023-09-12 at 10:45 -0700, srinivas pandruvada wrote:
> On Tue, 2023-09-12 at 15:52 +0300, Ilpo Järvinen wrote:
> > On Mon, 11 Sep 2023, srinivas pandruvada wrote:
> > > On Mon, 2023-09-11 at 18:47 +0300, Ilpo Järvinen wrote:
> > > 
> > > 
> 
> [...]
> 
> > > But I don't suggest using such method. This causes confusion and
> > > difficult to change. For example if we increase range of P-state
> > > control, then there is no way to know what is the start point of
> > > T-
> > > states.
> > 
> > Yes. I understand it would be confusing.
> > 
> > > It is best to create to separate cooling devices for BW and link
> > > width.
> > 
> > Okay. If that's the case, then I see no reason to add the Link
> > Width 
> > cooling device now as it could do nothing besides reporting the
> > current 
> > link width.
> > 
> > The only question that then remains is how to take this into
> > account
> > in 
> > the naming of the cooling devices, currently PCIe_Port_<pci_name()>
> > is 
> > used but perhaps it would be better to change that to 
> > PCIe_Port_Link_Speed_... to allow PCI_Port_Link_Width_... to be
> > added
> > later beside it?
> It is better in that way to add BW 
sorry, link width controller

> controller later.
> 
> Also adding separate cooling device will let thermal configuration,
> choose different method at different thermal thresholds or all
> together.
> 
> Thanks,
> Srinivas
> 
> > 
> > > Also there is a requirement that anything you add to thermal
> > > sysfs,
> > > it
> > > should have some purpose for thermal control. I hope Link width
> > > control
> > > is targeted to similar use case BW control.
> > 
> > Ability to control Link Width seems to be part of PCIe 6.0 L0p.
> > AFAICT, 
> > the reasons are to lower/control power consumption so it seems to
> > be 
> > within scope.
> > 
> > 
> 


^ permalink raw reply	[flat|nested] 24+ messages in thread

end of thread, other threads:[~2023-09-12 18:08 UTC | newest]

Thread overview: 24+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-17 12:16 [PATCH 00/10] Add PCIe Bandwidth Controller Ilpo Järvinen
2023-08-17 12:16 ` [PATCH 01/10] PCI: Protect Link Control 2 Register with RMW locking Ilpo Järvinen
2023-08-17 12:17 ` [PATCH 02/10] drm/radeon: Use RMW accessors for changing LNKCTL2 Ilpo Järvinen
2023-08-17 12:17   ` Ilpo Järvinen
2023-08-17 12:17   ` Ilpo Järvinen
2023-08-17 12:17 ` [PATCH 03/10] drm/amdgpu: " Ilpo Järvinen
2023-08-17 12:17   ` Ilpo Järvinen
2023-08-17 12:17   ` Ilpo Järvinen
2023-08-17 12:17 ` [PATCH 04/10] drm/IB/hfi1: " Ilpo Järvinen
2023-08-17 12:17 ` [PATCH 05/10] PCI: Store all PCIe Supported Link Speeds Ilpo Järvinen
2023-08-17 12:17 ` [PATCH 06/10] PCI: Cache PCIe device's Supported Speed Vector Ilpo Järvinen
2023-08-17 12:17 ` [PATCH 07/10] PCI/LINK: Re-add BW notification portdrv as PCIe BW controller Ilpo Järvinen
2023-08-17 12:17 ` [PATCH 08/10] PCI/bwctrl: Add "controller" part into PCIe bwctrl Ilpo Järvinen
2023-08-17 12:17 ` [PATCH 09/10] thermal: Add PCIe cooling driver Ilpo Järvinen
2023-08-23 19:47   ` Rafael J. Wysocki
2023-08-17 12:17 ` [PATCH 10/10] selftests/pcie_bwctrl: Create selftests Ilpo Järvinen
2023-09-04  6:26 ` [PATCH 00/10] Add PCIe Bandwidth Controller Krishna Chaitanya Chundru
2023-09-04 11:16   ` Ilpo Järvinen
2023-09-11 13:21     ` Krishna Chaitanya Chundru
2023-09-11 15:47       ` Ilpo Järvinen
2023-09-11 16:14         ` srinivas pandruvada
2023-09-12 12:52           ` Ilpo Järvinen
2023-09-12 17:45             ` srinivas pandruvada
2023-09-12 18:08               ` srinivas pandruvada

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.